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Recent Results in Signal Processing and Communication 


Foreword 

This special issue consists of nine papers which are selected from among those presented 
at the Conference on Signal Processing and Communication, held at Indian Institute of 
Science, Bangalore, during 9-12 August 1995. These papers report some new results in 
the fields of signal processing and communication. Together, they cover a wide spectrum 
and are pointers to the signal processing research of today, namely, time varying signal 
modelling, nonlinear modelling and processing, signal compression, and applications of 
DSP in communication by way of channel equalization, error correction, and modula¬ 
tion/demodulation. 

An essential aspect of signal processing is modelling and analysis of signals. The issue 
begins with two papers addressing the characterization of two specific signals, speech and 
electroencephalogram (EEG). Speech signals contain information about the time varying 
characteristics of the excitation source and the vocal tract system. In time-frequency analy¬ 
sis of speech signals, the size and position of the time-domain window are very important. 
The first paper by Yegnanarayana proposes an event-based approach for analysing speech 
signals. Defining the instants of significant excitation of the vocal tract system as events, 
the paper describes a method for extracting these events and also shows how the knowledge 
of these events can be used for various applications in speech processing. EEG is also a 
time varying signal whose source is the electrical activity of the brain. The progress in 
the theory of nonlinear dynamics of chaos has made it possible to think of modelling the 
brain as a continuous, spontaneously changing nonlinear dynamical system. The recently 
developed method of phase-space representation can show the dynamics of a system as it 
evolves with time. Attractor dimension is an important measure in characterizing the non¬ 
linear dynamical system. The second paper by Pradhan et al proposes the application of 
singular value decomposition to the phase-space trajectory for calculation of the attractor 
dimension of EEG signals. 

Like nonlinear modelling, nonlinear signal processing is also becoming a powerful tool 
for various applications. The foremost of such applications is image processing, where 
mathematical morphology is being used for processing and analysis of topological and 
geometrical aspects of an image. The third paper by Singh and Siddiqi presents a theory 
for morphological processing of multichannel images within the abstract framework of 
lattice theory. Multichannel processing deals with processing of colour and multispectral 
(satellite) images. The paper shows that matrix morphology formulation is a natural con¬ 
sequence of marginal ordering and serves as a technique for processing of multichannel 
images. 

Compression of signals is essential for efficient communication. Conventional lossless 
image compression schemes employ a decorrelation technique such as DPCM followed 
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by an entropy coding scheme such as Huffman coding. The fourth paper by Damodare et 
al proposes a radically different approach based on finding the minimal Boolean function 
representation for sub-blocks of an image bit plane. Their lossless compression scheme 
uses linear prediction as a preprocessing step and yields a compression ratio comparable to 
that of the JPEG lossless compression standard. Their lossy compression scheme involves 
reduction of the number of bit planes and application of the lossless technique to the 
residual bit planes. Sub-band coding, a well-known lossy compression scheme, uses a class 
of filter banks known as paraunitary filter banks which has received particular attention 
recently. In several applications such as image compression, it is desirable to have each 
filter in the bank to be a linear phase filter. Further, a filter bank with pairwise mirror-image 
symmetry in the frequency domain requires fewer parameters. The fifth paper by Soman 
presents a minimal and complete factorization for filter banks that satisfy all the three 
properties, namely, paraunitariness, linear phase, and pairwise mirror-image symmetry, 
simultaneously. 

Effecient data communication across non-ideal and noisy channel calls for channel 
equalization and error correcting codes. In channel equalization, a filter or a signal pro¬ 
cessing algorithm, called an equalizer, is used at the receiver to recover the transmitted 
data. The sixth paper by Gracias and Reddy presents a wavelet packet based channel equal¬ 
ization. The proposed method exploits the fact that for sufficiently narrowband sequences, 
a non-ideal channel can be modelled as an attenuation and delay. The equalization problem 
then reduces to that of determining the delay introduced by the channel for the sequence. 
The authors propose the use of suitably designed wavelet packets as narrowband sequences 
and develop an algorithm to determine the delay introduced by the channel for each of 
the wavelet packets. Error correcting codes, on the other hand, are employed to build 
noise immunity during data communication. The codewords used are vectors of a partic¬ 
ular dimension over some finite alphabet. The seventh paper by Joseph and Makur treats 
the problem of designing of such block codes of certain dimension as forming clusters 
in the particluar dimensional space. The authors propose deterministic annealing method 
for designing the codes. While the conventional deterministic annealing makes use of the 
Euclidean squared error distance measure, their paper develops an algorithm which can be 
used for clustering with Hamming distance as the distance measure. 

Digital signal processing contributes to data communication in other ways, too. The 
last two papers use DSP algorithms and DSP chips to design efficient modulation and 
demodulation schemes for minimum shift keying (MSK). In recent years, MSK and its 
variants have become increasingly popular modulation techniques for signaling through 
bandwidth- and amplitude-limited channels. Gaussian MSK is employed in many mobile 
and personal communication systems. It permits the use of a variety of receivers, ranging 
from the coherent to the digital FM type, each with a different complexity/performance 
trade-off. The digital FM modem has been in use for many years, but its performance is poor 
when compared to the best possible receiver, e.g., a coherent receiver. The eighth paper by 
Ramamurthi et al describes an improved detection scheme for digital FM reception. The 
improved algorithm is based entirely on digital signal processing techniques and requires 
only a small increase in complexity. The last paper by Kumar and Reddy presents the 
design and implementation details of an MSK modem using a novel transmitter and receiver 
structure. There is an ever-increasing demand for low cost, high speed data communication 
systems in a variety of applications such as VSATs, personal communication systems, and 
radio paging. The authors exploit the features and architecture of modem low-cost DSP 
chips in their design and implementation. 
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On timing in time-frequency analysis of speech signals 

B YEGNANARAYANA 

Department of Computer Science and Engineering, Indian Institute of 
Technology, Madras 600 036, India 

Abstract. The objective of this paper is to demonstrate the importance of 
position of the analysis time window in time-frequency analysis of speech sig¬ 
nals. Speech signals contain information about the time varying characteristics 
of the excitation source and the vocal tract system. Resolution in both the tem¬ 
poral and spectral domains is essential for extracting the source and system 
characteristics from speech signals. It is not only the resolution, as determined 
by the analysis window in the time domain, but also the position of the window 
with respect to the production characteristics that is important for accurate anal¬ 
ysis of speech signals. In this context, we propose an event-based approach for 
speech signals. We define the occurrence of events at the instants corresponding 
to significant excitation of the vocal tract system. Knowledge of these instants 
enable us to place the analysis window suitably for extracting the characteristics 
of the excitation source and the vocal tract system even from short segments 
of data. We present a method of extracting the instants of significant excita¬ 
tion from speech signals. We show that with the knowledge of these instants 
it is possible to perform prosodic manipulation of speech and also an accurate 
analysis of speech for extracting the source and system characteristics. 

Keywords. Time-frequency analysis; group delay; glottal vibration; prosodic 
manipulation; voice source; vocal tract system. 


1. Introduction 

Many natural signals contain useful spectral information over a range of frequencies, and 
these spectral features may vary with time. In order to perform spectral analysis a win¬ 
dowed time domain signal is chosen. The size of the window is dictated by the desired 
resolution in the frequency domain. The shape of the window is dictated by the edge ef¬ 
fects due to abrupt termination of the signal. Effect of windows on time-frequency analysis 
of signals has been well studied in the literature (Harris 1978). Usually shorter windows 
provide better temporal resolution but are suitable only for high frequency signals. On 
the other hand, longer windows provide better frequency resolution but mask features due 
to fast temporal variations. Several methods have been proposed addressing the issues of 
adapting the effective window size for achieving temporal and spectral resolutions simul¬ 
taneously (Hlawatsch & Boudreaux-Bartels 1992). Most of the time-frequency analysis 
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Figure 1. Speech signal and th< 
responding glottal waveform she 
the open and closed regions, (a) S] 
waveform, (b) Glottal waveform. 


methods consider mainly the issue of resolution. But the positioning of the window rel 
to the signal has significant influence on the results of analysis (Rabiner et al 1977) 
example, if the analysis window contains transition from one steady state to anoth 
is difficult to associate a steady system with the derived spectrum. Moreover, in ord 
examine the variability of the system over a period of time, it is necessary that the sy 
characteristics are extracted from steady portions in the region. That means that the a: 
sis window has to be positioned suitably with respect to the signal. We call this positic 
of window as timing in time-frequency analysis. 

The timing becomes crucial for the time-frequency analysis of speech signals, 
objective in speech analysis is to extract the characteristics of the excitation source an 
vocal tract system from speech signals. Since both the source and the system vary di 
production of speech, it is necessary to use short windows for analysis. Note that even w 
a pitch period of voiced speech, the source is not steady due to glottal vibration. Als 
vocal tract system is not steady due to interaction between source and system. Then 
even pitch synchronous analysis of voiced speech does not overcome the proble 
position effects of the analysis window mentioned above. The pitch synchronous place 
of window does not guarantee that the signal within the window corresponds to a st 
system (Parthasarathy & Tufts 1987). 

In voiced speech the vocal tract system is excited by a periodic glottal vibration, 
glottal vibration is due to slow opening and sudden closing of the vocal folds, follow* 
a closed phase during each pitch period (see figure 1). Significant excitation of the' 
tract system takes place during the rapid closing part of the glottal vibration. The 1 
tract system characteristics are preserved in the signal in the closed glottis region, 
vocal tract resonances are damped significantly during the opening phase of the g 
vibration, when the trachea is coupled to the vocal tract system. Thus the vocal 
system characteristics are significantly different during the open and closed phases c 
glottal vibration. In addition, the excitation source will have a turbulent noise comp* 
due to air passing through the narrow constriction created during the closing pha 
the glottal vibration. Thus a straightforward analysis of the speech signal, even us 
pitch synchronous window, is not likely to bring out the time varying source and sy 
characteristics. 

In order to represent the time varying vocal tract system characteristics across se 
pitch periods, it is necessary to use an analysis window enclosing the steady vocal 
system region in each pitch period, and compare the system characteristics derived 
similar regions for all pitch periods. These are two steady regions in each pitch pe 
one corresponding to the closed glottis region, and the other to the open glottis re 
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Thus it is necessary to identify these regions first and then select a suitable window within 
each region for analysis. One way to do this is to determine first the instant of significant 
excitation in each pitch period. Then a short (2-3 ms) segment to the right of the instant 
can be considered the closed phase region and a short (2-3 ms) segment to be left of the 
instant the open phase region. Even though this division provides only an approxima¬ 
tion to the selection of the regions, analysis of the vocal tract system corresponding to, 
say the closed phase region, across successive pitch periods will definitely bring out the 
time varying vocal tract resonance characteristics accurately. Moreover, the knowledge 
of the instants of significant excitation also permits an accurate analysis of the source 
characteristics. 

We call these instants of significant excitation of the vocal tract system as events , and the 
analysis based on the knowledge of these events as event-based analysis. It is obvious that 
event-based analysis will provide better results than the results from arbitrary placement of 
windows for analysis. But it is necessary to determine these events first before performing 
time-frequency analysis. 

In this paper we describe a method for extracting the instants of significant excitation 
from speech signals, and show how the knowledge of these instants can be used for various 
applications in speech processing. In §2 we discuss the signal processing principles used for 
extraction of the instants of significant excitation. In §3 we describe a method of extraction 
of these instants from speech signals. We show in §4 how the knowledge of these extracted 
instants can be used for prosodic manipulation, which involves modifying the pitch periods 
and durations of various speech segments in a predetermined manner without modifying 
the vocal tract system characteristics. In §5 we give some results of analysis of the source 
characteristics by using two successive pitch periods at a time. In §6 we demonstrate the 
use of the instants of significant excitation for extracting the time varying characteristics 
of the vocal tract system in the closed and open glottis regions separately from successive 
pitch periods. 

2. Principle of the proposed method of extraction of instants of significant 
excitation (Yegnanarayana & Smits 1995) 

Consider a unit sample sequence with the sample at t — r as shown in figure 2a. Then 
its Fourier transform (FT) is e~^ C0T and hence the FT phase is given by (p{(D) = —cox. 
Thus the derivative of phase or the negative group delay is given by 4>\a)) = — r. As the 
analysis window, centred around the origin t = 0 and enclosing the unit sample, is moved 
to the right, (j)'{ cd) will vary linearly with time, crossing through zero at t = r, as shown 
in figure 2d. The plot in figure 2d is called phase slope function. If we consider a unit 
sample response (shown in figure 3a) of a second order all-pole system corresponding to a 
damped resonator, the average of (j ) 1 {(d) will be equal to the delay of the unit sample from 
the origin. By moving the analysis window to the right, the average (p'{co ) will increase 
linearly with time passing through zero at t = r, as shown by the phase slope function plot 
in figure 3c. Thus for each analysis window position it is necessary to compute the group 
delay and to find the average of the group delay spectrum. The group delay spectrum is 
computed as follows. 

Let x{n), n = 0, 1,... N — 1 be the given signal in the analysis window and let 
y(n) = nx{n), n = 0, 1... N - 1. Then the derivative of the FT phase or negative group 




Figure 2. Illustration of group delay properties of a unit sample, (a) Unit sample 
sequence, (b) FT phase of unit sample sequence, (c) Derivative of FT phase, (d) Phase 
slope function. 


delay is given by (Oppenheim & Schafer 1989, ch. 12). 

4>\co) = -[ Xi(fi>)Yi(fi>) + X R (co)Y R (a>mxj(c 0 ) + X\{co)] (1) 


where the FT of x(n ) is given by X (&>) = X R (co) 4- jXi(co) and the FT of y(n) is given 
by Y(co) = Y r (co) + JY/(co). The derivative of the FT phase <p'(co) is computed from the 
discrete Fourier transforms of jc(n) and y(n) using a suitable order for DFT, preferably 
greater than twice the size of the analysis window. 

Since <p'(a> ) is computed using a windowed signal, and also since it is available only at 
discrete frequencies, there will be large fluctuations in the computed values of 4>'(co) . To 
obtain the average value from the discrete values of 4>’ (co), the function is smoothed using 
a 3-point median filtering to eliminate large fluctuations, and then the mean value of the 
smoothed <p'(co) is computed. Due to computational inaccuracies, there may be some error 




Figure 3. Illustration of group delay properties of a damped sinusoidal signal, 
(a) Delayed damped sinusoidal signal, (b) Derivative of the FT phase, (c) Phase 
slope function. 
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Figure 4. Phase slope function for a 
periodic impulse sequence, (a) Periodic 
impulse sequence, (b) Phase slope func¬ 
tion for a window size of 6.4ms. (c) 

Phase slope function for a window size 
tim«(rns) of 12.8ms. 

in the resulting estimate of the mean value of <t)'(co). The mean value (p f is estimated at 
each instant by shifting the analysis window by one sample at a time, to obtain 4> f for that 
instant. We call the resulting function 0' vs time the phase slope function. The positive 
zero crossing instant of the phase slope function gives the instant of significant excitation 
in the analysis window. It is also interesting to note that the phase slope function is not 
dependent on the phase characteristics of the system, as long as the system is a minimum 
phase system. This is because a minimum phase system excited by an impulse at t = 0 
has zero average phase slope value (Berkhout 1974; Oppenheim & Schafer 1989, ch. 5). 
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3. Illustration of the method for synthetic and natural speech signals 
(Smits & Yegnanarayana 1994) 

3.1 Instants of excitation in synthetic signals 

The phase slope function is computed for several cases of synthetic signals as shown in 
figures 4-9. Examples are given for two different window sizes in each case. Each figure 
contains three plots: (a) the original time signal, (b) the phase slope function for an analysis 
window of size 6.4ms, and (c) the phase slope function for an analysis window of size 
12.8ms. Figure 4 is for a periodic impulse sequence, figure 5 is for a periodic impulses 
sequence, and figure 6 is for the output of an all-pole model excited by the periodic impulses 
sequence. In figures 4-6 we can see that the positive zero-crossing instants of the phase 
slope functions correspond to the instants of major excitation within the chosen analysis 
window. In figure 4 a large portion of the phase slope function is linear around the instant 




Figure 5. Phase slope function for a 
periodic impulses sequence, (a) Peri¬ 
odic impulses sequence, (b) Phase slope 
function for a window size of 6.4ms. 
(c) Phase slope function for a window 
size of 12.8ms. 
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Figure 6. Phase slope function for 
the output of an all-pole model ex¬ 
cited by the periodic impulses sequence 
of figure 5a. (a) Output of an all-pole 
model excited by a periodic impulses 
sequence, (b) Phase slope function for 
a window size of 6.4ms. (c) Phase slope 
function for a window size of 12.8ms. 

of major excitation. This linear part is the result of dominance of one excitation in th 
analysis window. 

If a signal containing minor excitations is analysed using a large-size window (figures 
and 6), the phase characteristics due to major excitation dominate the phase slope func 
tion, and the minor excitations do not influence the positive zero-crossings. However, th 
presence of these minor excitations may sometimes make it difficult to identify the instant 
of major excitations. 

Figures 4-6 also show that the phase slope function is mostly dictated by the excitatio 
signal. It is interesting to note that neither the minimum phase all-pole system, nor th 
location and size of the analysis window has influenced the decision on the excitatio 
instants obtained from the phase slope function. Even for a window size greater than 
pitch period, the use of the Hanning window reduces the effects of the surrounding impulse 
on the resulting extraction of the excitation instants. This is because within a window onl 
one major excitation impulse is likely to dominate. 

For random noise (figure 7), the features in the phase slope function are different fc 
different window sizes. It is interesting to note that any major excitation in the noise sign; 
will clearly show up irrespective of the size of the analysis window. For any noise signa 
these excitation instants will be distributed randomly in time. That is why the method wi 
not work well for noisy speech, as the excitation due to noise will show up randomly i 
between the instants of glottal closure. Figure 8 shows the behaviour of the phase slop 
function for two different window sizes for a signal generated by exciting an all-pole fil U 
with random noise. 

For sinusoids (figure 9), there are no isolated major points of excitation, and the phas 
slope function does not show the characteristic linear part around the positive zero-crossir 
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Figure 7. Phase slope function for 
a random noise sequence, (a) A ran¬ 
dom noise sequence, (b) Phase slope 
function for a window size of 6.4ms. 
(c) Phase slope function for a window 
size of 12.8ms. 
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Figure 8. Phase slope function for 
an all-pole model excited by a random 
noise sequence, (a) Output of an all¬ 
pole model excited by a random noise 
sequence, (b) Phase slope function for 
a window size of 6.4ms. (c) Phase slope 
function for a window size of 12.8ms. 

instants. The features of the phase slope function are different for different window sizes, 
and the effects of windowing show up clearly in the resulting phase slope function. 

3.2 Instants of significant excitation in natural speech signals 

In this section we discuss the performance of the proposed method on natural speech 
data. In all the illustrations to follow, the speech signals were sampled at 10 kHz. In order 
to minimize the effects of position of the analysis window with respect to the impulse 
response, it is better to obtain at least an approximation to the excitation signal before 
computing the average slope of the phase spectrum. Linear prediction (LP) residual (Markel 
& Gray 1976) is a good approximation to the excitation signal, as the correlation between 
adjacent samples is significantly reduced from what it is in the original signal. Note that 
since inverse filtering in linear prediction analysis is in effect passing the speech signal 
through a minimum phase system, the phase slope characteristics of the excitation will not 
be altered in the residual. For the computation of the residual, a 10th order linear predictor 
was used in this study, although the order is not very critical for this analysis. 

Since for speech data there will be several points of excitation even within a pitch period, 
the phase slope function will have many fluctuations. In order to determine the instants of 
significant excitation, the points of positive zero-crossing of the phase slope function are 
obtained by smoothing the function. 

Once the major excitations are identified, it is possible to explore for the presence of 
other excitation instants by computing the phase slope function on the residual using 
a smaller window for analysis, typically half the size of the original window. Thus the 
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Figure 9. Phase slope function for a 
sinusoidal sequence, (a) Sinusoidal se¬ 
quence. (b) Phase slope function for a 
window size of 6.4ms. (c) Phase slope 
function for a window size of 12.8ms. 
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Figure 10. Illustration of extraction of the instants of significant excitation for 
voiced speech, (a) A segment of voiced speech signal, (b) Linear prediction residual 
of (a), (c) Phase slope function for the LP residual signal in (b). (d) Unit impulse 
sequence with impulses located at positive zero-crossing instants of the plot in (c). (e) 
Smoothed phase slope function, (f) Unit impulse sequence with impulses located at 
positive zero-crossing instants of the plot in (e). (g) Gain plot indicating the strengths 
of the impulses at the positive zero-crossing instants of (e). 


method, in principle, enables us to determine other significant instants in the excitation. 
For identifying a single significant instant in each pitch period, it is preferable to have an 
analysis window size in the range of one to two pitch periods. 

Figures 10a and b show a segment of voiced speech and its linear prediction residual, 
respectively. Figure 10c shows the phase slope function computed from the residual. Fig¬ 
ure lOd gives all the positive zero-crossing instants in the phase slope function. To select 
the ones corresponding to significant excitation, the phase slope function is smoothed us¬ 
ing a 13-point Hanning window. Note that the size of the smoothing window is not very 
critical, as long as it removes some fine fluctuations. The positive zero-crossing instant cor¬ 
responding to the major excitation in the analysis window is not affected by this smoothing 
operation. The instants of positive zero-crossing in the smoothed phase slope functions 
are shown in figure lOf. 

We have also computed the gain at each of the zero-crossing instants, by computing the 
square root of the average energy per sample in the LP residual for the interval between 
two successive zero-crossing instants. The interval is centered around the zero-crossing 
point under consideration. The resulting gain values are shown in figure lOg. These values 
may be viewed as strengths of the impulses at the selected instants. 
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Figure 11. Illustration of extraction of the instants of significant excitation for a 
portion of the utterance for the sentence “ANY DICTIONAry will give...” by a male 
speaker, (a) Speech signal, (b) Linear prediction residual, (c) Phase slope function 
for the linear prediction residual signal, (d) Gain plot showing the strengths of the 
impulses at the significant instants. 


3.3 Analysis of continuous speech 

Figure 11 illustrates the result of our method on the initial part of the utterance “ANY 
DICTIONAry will give at least... ”, uttered by a male voice. This utterance consists of a 
variety of segments like voiced, unvoiced, nasal, transition, stop, fricative etc. The length 
of the analysis window was 10ms, and the average pitch period was 8ms. Figures 1 la and 
b show the speech waveform and the LP residual signal respectively. Figures 1 lc and d 
give the phase slope function and the gain plots, respectively, for the residual signal of 
figure 11 b. The voiced segments have clear quasi-peperiodic instants of excitation. The 
unvoiced and silence parts show random instants of excitation. The instants (Figure 1 Id) 
identified by the phase slope function indeed correspond to the instants of significant 
excitation as can be seen by comparing with the residual signal in figure lib. 

Let us look at the individual signal categories in some detail. In the silence and unvoiced 
fricative regions, almost no significant excitation was identified. Note that in the unvoiced 
fricative regions the instants are at irregular intervals as in the case of noise, and hence are 
not significant. However, whenever there is a transition from one category to another, like 
at a burst release, the transition point is identified as a significant instant. In many weakly 
voiced regions there will not be any significant excitation, as evidenced by the residual 
signal. The same is reflected in the gain plot, although the speech waveform shows low 
frequency periodicity. Absence of significant excitation instants in this case can be verified 
by observing the amplitude spectra for these regions. Typically the spectra do not show 
any resonance or formant structure in these cases, but only show some energy at the pitch 
frequency. 

Since the technique uses the phase characteristics of the excitation signal, the vocal 
tract system has very little influence on the proposed method of determining the instants 
of excitation. That is why it can be seen that the technique works well not only for steady 
vowels, but also for diphthongs, transitions, liquids and nasals. Note that the method shows 
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significant quasiperiodic excitations even in cases where they are not clearly evident in the 
linear prediction residual. 

The technique works equally well for female voice speech also for all categories of 
segments. Note that because of the smaller average pitch period for female voices, a 
smaller analysis window size may be required. 

4. Prosodic manipulation (Yegnanarayana & Teunen 1994) 

In many applications and for studies in speech perception it is often desirable to generate 
speech with specified characteristics or to modify a given speech signal by incorporating 
some specified features. The features may include changes in the vocal tract system and 
source characteristics. These characteristics at a segmental level may correspond to, for 
example, the average pitch, vocal tract length and the source-tract interaction within each 
pitch period. At the suprasegmental level, the characteristics of interest are the durations of 
units at syllable or high levels, intonation and the speaking rate. Here we address the issue of 
modifying a given speech signal to incorporate specified features mainly at suprasegmental 
level. The emphasis is on the manipulation of prosodic features such as speaking rate and 
intonation (Moulines & Laroche 1995). 

Availability of the instants of significant excitation makes prosodic manipulation easier, 
in principle, as it is these instants that need to be modified to realize any desired prosodic 
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Figure 12. Illustration of prosodic manipulation. V and U are the voiced and un¬ 
voiced labels for the instants. T-s are intervals between instants in voiced regions, and 
tjs are intervals between instants in unvoiced regions, (a) Instants in the input data, 
(b) Instants shifted due to time scale multiplication factor a. (c) The new instants 
and the entries in the pitch period field at each instant in voiced and unvoiced re¬ 
gions, where the pitch period is modified by a factor p. Note that the spacing between 
imoulses is B Tj in voiced regions and t; in unvoiced regions. 
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characteristics. We focus mainly on the issue of manipulation of speaking rate and pitch 
period, although it is also possible to affect changes in the segmental characteristics as well. 
We discuss the procedure to incorporate the desired prosodic modifications. However, the 
procedure to derive the modification rules themselves is not within the scope of this work. 

The data available for prosodic manipulation are the speech signal, the instants of signif¬ 
icant excitation in the form of a gain function, and the linear prediction coefficient (LPCs) 
data file with Voiced (V)/ Unvoiced (U) labels. Centred around each of these instants a 
windowed speech signal is taken, and a residual signal is obtained by passing the speech 
signal through the inverse filter defined by the predetermined LPCs for the segment. From 
the residual signal around the instant, the required number of samples are taken to associate 
with the current instant. The gain per sample is computed at the instant by computing the 
square root of the mean squared energy of the residual signal associated with the instant. 

The basic approach in prosodic manipulation is to derive an excitation signal incorpo¬ 
rating the desired modification in the speaking rate and the pitch period. This is done by 
first taking the instants information in the gain function, and creating new instants data 
incorporating the speaking rates and pitch period modifications specified in the form of 
scale factors for these parameters. We associate with each instant, the time, pitch period 
(interval between successive instants), LP residual and LPCs. For speaking rate/duration 
manipulation, we obtain the new scaled time instants using the time scale manipulation 
factor. Likewise, for pitch manipulation, the pitch period associated with each instant is 
scaled appropriately. Now a new set of instants and the parameters at these new instants 
are determined as follows (see figure 12): 

Proceeding from left to right, the first instant is copied as a new instant. The next new 
instant should be at the pitch period away from the first one, the period information being 
available in the parameter list associated with the first instant. Determine which of the 
old instants are closer to the new instant. Associate the parameter information of the old 
instant to the new instant. It is also possible to obtain an interpolated value of the pitch 
period for the current new instant from the pitch periods at the old instants which are on 
either side of the current new instant. Use the pitch period value in the parameter list at the 
current new instant to obtain the next new instant. This process is repeated until the end 
of all the instants derived from the original speech data. 

Problems will arise while copying the residual samples at the new instants from the 
parameter list associated with the old instants, if the new pitch period is smaller than the 
old value at that instant. The required number of residual samples around the instant are 
copied. But to avoid discontinuity due to this partial selection of the residual samples, the 
residual signal samples are multiplied with a Hanning window. The signal is high pass 
filtered (cut-off frequency of about 50 Hz) to remove the very low frequency components 
including the zero frequency component. This will ensure that the resulting residual signal 
has zero mean. This process may produce some distortion, especially when the pitch period 
is scaled down significantly, say by a multiplication factor of 0.5 or lower. If the scaled 
pitch period is larger than the old one, the additional excitation samples needed in each 
pitch period are set to zero. The resulting excitation samples are appropriately scaled to 
obtain the gain value specified in the parameter list for the instant. 

For instants labelled as unvoiced (U), the required number of residual samples are copied 
from the residual signal associated with the instants. For these instants, the entry in the pitch 
period field associated with the instants is not modified. Therefore if the interval between 
instants increase due to expansion of the time scale (slow speaking rate), some segments of 
the residual samples belonging to the unvoiced portion may be repeated. Sometimes this 


16 


B Yegnanarayana 


Hihle distortion. One way to overcome this is to use random sam 
will produce some au reoeating the residual samples, 

with appropriate gain, instead of repe g model, delined by the LPCs an, 

speech signal ts genem«d by« g R „ ^ ^ t0 vary the al -po^e m 


wun -- .* Kv pvciting the an-puic —*- 

Speech signal is generated by excit g . g also p0SSl51e t0 vary the all-pole m 

gain parameter, with the new excitat on g • £nces in the vocal tract system t 

characteristics within a This is realized approximately by usi 

closed and open phases o d t0 the poles moved towards the ongm, 

the open phase a ^ m pi n g of formants in die all-pole moctelreprese 


the open phase a set of LPCs wlnchco^esponu^iu^^F-^ the all-pole model represe 

z-plane. This creates an effecJ ectcan be controlled by using a parameter tom 
the vocal tract system. The=da F'8 of a circle in the z-plane concentnc wi 

the LPCs. The parameter is simpiy 

unit circle. excitation signal using a model for the glottal pul 

It is also possible to generate ^unvoiced segments, and appropriately synchro, 
voiced segments and random noise ^ insta nts. The glottal pulse model ca 

them with the Scribed in the literature (Childers & Wong 1994 


s, Analysis of voice source characteristics 

■ u , trip knnwledee of the instants of significant exc 

In this section we shall examine ho characte ristics, especially for voiced s 

can be used for a careful aM&M ° excitatio n wit hin each pitch period, anc 
So far we have assumed only one m j periods tQ exlract the , 

we have used an analysis to the instants of glottal cl« 

of significant excitation. es ® 1 although minor, at the instant of glottal < 
r L, , f we use an analysis wmdow 

_ _r-—'_1-- “1 


6000 r 


-5000 



x 10* ( 

—r 

——--r ! ~T ~T 


i—.——— 

-£-80 w * 


ft 



... f T p residual into deterministic and random 

nf thp T P residual. 

















Time-frequency analysis of speech signals 


17 


less than one pitch period, typically about half the pitch period. Then we will have one 
additional positive zero-crossing of the phase slope function within each pitch period, and 
this corresponds to the opening instant. This can be seen in figure 5b, where the second 
positive zero-crossing corresponds to the instant of the small negative pulse. Note that it 
will not appear if the window size is of the order of a pitch period as in figure 5c. Thus if 
the excitation at the instant of glottal opening is also significant, then it can be extracted 
with the proposed method using a smaller window. These minor excitation instants can 
be identified after extracting the instants of major excitation at the glottal closure using a 
longer window. 

The source for voiced speech also has a component of turbulent noise around the instant 
of glottal closure. The voice source within each pitch period can be viewed as consisting 
of two components: one corresponding to the deterministic part of excitation and the other 
to the random part. Thus we need to separate the excitation signal into the deterministic 
and random parts. In order to do this, we take the LP residual signal for two successive 
pitch periods using the knowledge of the glottal closure instants. A recently proposed 
decomposition algorithm can be used to separate the deterministic and random parts of the 
excitation (d’Alesandro etal 1995). The results are shown in figure 13. Figure 13a shows 
the LP residual for two successive periods, and figures 13b and c show the deterministic 
and random parts of the residual, respectively. The random part shows large amplitude 
noise signals near the glottal closure instants, and these noise signals can be attributed to 
the turbulent noise in the excitation. Autocorrelation functions of these three signals are 
shown in figure 14. The autocorrelation function of the random part (figure 14c) does not 
show any peak at the pitch period as in the autocorrelation function for the deterministic 
part (figure 14b). 

6. Analysis of vocal tract system characteristics 

In this section we present methods to extract the characteristics of the vocal tract system 
from speech signals, especially when the characteristics are changing as in dynamic sounds 
such as consonant-vowel combinations. The methods are based on selecting appropriate 
segments of speech for analysis . The selection of the segments is based on the instants of 
significant excitation of the vocal tract system. The analysis window is chosen starting at 
each of these instants; The vocal tract system is represented by formants or the resonances 
extracted from the short segment. Due to selection of the segments at the instants of glottal 
closure, the extracted formants will be consistent across successive pitch periods in voiced 
speech. 

For voiced speech, a short (about 3 ms) segment of speech before the significant instant 
is considered as belonging to the open phase and the segment after the instant is considered 
as belonging to the closed phase. In practice there is no guarantee that there will be a distinct 
closed phase region. The samples after the instant of glottal closure are analysed to extract 
the information in that region. Likewise, the samples before the instant are analysed and 
the results are attributed to the open phase region. The number of samples to be considered 
should be less than the period between the current and the next significant instants. The 
minimum number of samples required depend on the order of the model and also on the 
method used to compute the parameters of the model. It is preferable to use as many 
samples as possible but larger the number the more likely that there will be a change in 
the vocal tract system characteristics in the analysis interval. 




Time lit # sample# 

Figure 14. Autocorrelation functions of the signals in figure 13. (a) Autocorrelation 
function of the LP residual signal, (b) Autocorrelation function of the deterministic 
part, (c) Autocorrelation function of the random part. 

A segment size of 30 samples (corresponding to 3 ms at 10 kHz sampling rate) is 
considered in this study. Short time spectral analysis is not useful on such segments due to 
poor resolution of the spectral components. High resolution model based analysis methods 



Figure 15. LP spectra (10th order) for eight successive closed phase regions in a 
voiced speech segment. 
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Figure 16. LP spectra (10th order) for eight successive open phase regions in a 
voiced speech segment. 

can be used to extract the formant information corresponding to the vocal tract system in 
that region. We have used the standard covariance LP analysis to derive the LP spectrum, 
from which the formants can be estimated (Markel & Gray 1976). 

Choosing the analysis frames synchronized with the instants of significant excitation give 
consistently better results than when the frames are chosen at regular spacing. Figure 15 
shows the LP spectra for eight successive closed phase regions in a segment of voiced 
speech. The formant locations and their movements can be clearly seen from the figure. 
Figure 16 shows the LP spectra for the corresponding eight successive open phase regions 
in the segment. The formant locations and bandwidths in the open phase regions are 
different from the values for the corresponding closed phase regions. Moreover, the formant 
bandwidths are consistently higher in these open phase regions. Thus the knowledge of 
the instants of glottal closure for each pitch period enables us to extract the characteristics 
of the vocal tract system accurately. 

7. Conclusions 

In this paper we have discussed the importance of timing in time-frequency analysis of 
signals. The positioning of the analysis window in time-frequency analysis becomes critical 
in extracting dynamic source and system characteristics from speech signals. In order to 
position the analysis window suitably, it is necessary to determine the instants of significant 
events of production in speech signal. We have proposed a method of extracting the instants 
of significant excitation of the vocal tract system which corresponds to the instants of 
glottal closure in voiced speech. We have demonstrated that knowledge of these instants 
enables us to perform easily the prosodic manipulation of speech signals. It is also possible 
to extract the source and vocal tract system characteristics accurately, as the knowledge 
of the instants of significant excitation will enable us to choose the analysis segments 
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suitably. In particular, we have shown that the variations in the characteristics of the vocal 
tract system can be derived accurately by focussing the analysis on successive closed phase 
regions in voiced speech. 


Part of this work was carried out at the Institute for Perception Research, Eindhoven 
Technical University, The Netherlands, during the author’s visit to the Institute in 1994. The 
author gratefully acknowledges the contributions of Prof. Rene Collier, Mr L F Willems, 
Dr Raynold Veldhuis, Mr R L H Smits and Mr Remco Teunen. 
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Abstract. This paper describes a novel application of singular value decom¬ 
position (SVD) of subsets of the phase-space trajectory for calculation of the 
attractor dimension of a small data set. A certain number of local centres (M) 
are chosen randomly on the attractor and an adequate number of nearest neigh¬ 
bours (<q = 50) are ordered around each centre. The local intrinsic dimension 
of a local centre is determined by the number of significant singular values and 
the attractor dimension (D 2 ) by the average of the local intrinsic dimensions of 
the local centres. The SVD method has been evaluated for model data and EEG. 
The results indicate that the SVD method is a reliable approach for estimation 
of attractor dimension at moderate signal to noise ratios. The paper emphasises 
the importance of SVD approach to EEG analysis. 

Keywords. Nonlinear dynamics; chaos; EEG; SVD; phase-space; attractor 
dimension. 

1. Introduction 

The field of nonlinear dynamics or chaos has undergone explosive growth in the last few 
years. Applications have been made in many diverse fields including physics, chemistry, 
fluid dynamics, meteorology, economics, medicine and sociology (Lin 1984; Thomson & 
Stewart 1987). One aspect of nonlinear dynamics involves time series analysis. Electroen¬ 
cephalogram (EEG) is a time series of the electrical activity of the brain. It is regarded as a 
paraphenomenon of integrated metabolic processes of the brain. It reflects the activities in 
the underlying brain structures and particularly that of the cerebral cortex below the scalp 
surface. EEG is one of the commonly used noninvasive tools for studying brain functions. 
The frequency domain representation of EEG has been in use for over half a century to 
discern the various patterns of brain activity in relation to behavioural states. In this, a 
signal block is taken and then Fourier-transformed to get the power spectral estimates 


‘Author for correspondence 


21 



22 


r radium et al 


which has led to the familiar alpha, beta, delta and theta components. The alpha a< 
represents EEG activity in the frequency range 8-13 Hz. It is seen during eye-close 
laxed states. It is predominantly seen in the occipital region of the scalp. EEG activ 
the frequency range 13-30 Hz is termed as beta rhythm and characterises aroused anc 
ilant state of behaviour. The theta (4-8 Hz) activities are observed during drowsines: 
in pathological conditions. Similarly the delta activities (0.5-4 Hz) are seen during 
sleep. However, these never reflect the dynamicity of the underlying process of a frequ 
band into which maximum power is concentrated. On the contrary the newly devel 
method of phase-space representation can show the dynamics of a system as it ev< 
with time. With emergence of methods for handling time series generated by nonl 
dynamics, a different set of descriptive measures are now available. The progress i 
theory of nonlinear dynamics in the past decade has made it possible to think of mode 
the brain as a continuous, spontaneously changing, nonlinear dynamical system (I 
1983). Grassberger & Procaccia (1983a, 1983b) have further developed the tools to n 
experimental data as the output of nonlinear dynamical system. The study involving 
falls in this category. These new concepts for the investigation of microscopic prope 
of brain activity offer a fresh way to provide new explanations (Pool 1989). For e: 
pie let us consider the changing pattern of EEG as a person goes from wakefulne 
drowsiness to deep sleep. The brain activity in waking state is desynchronised and 
eye closure partial synchrony (alpha rhythm, 8-13 Hz activity) is observed. It can be 
that as drowsiness starts and sleep gradually ensues, various transitional states appe 
stage 1, stage 2, stage 3 and stage 4 in the sleep cycle (Rechtschaffen & Kales 1968). 
brain activity evolves from random-like behaviour to more periodic activity with inc 
ing depth of slow wave sleep. It is broken by intermittent bursts of desynchronised act 
of varying durations called REM (rapid eye movement) during which dreams occi 
similar form of synchronised slow wave activity is seen in anaesthesia. A deep sta 
anaesthesia is marked by low amplitude and highly periodic slow wave activity. The 5 
and wave activity during seizure discharge is a high frequency periodic activity o 
neurons and the specific waveforms are repeated for a short duration. There are re] 
of high alpha activity and increased coherence during various meditative states. The 
chotic states have not yielded any definite EEG patterns. Even the maturational stati 
brain may be reflected in the infant EEG. Cognitive activites or intensive mental tasks 
not produced any discerning patterns in EEG different from background activity. It is 
believed that nonlinear dynamical methods of analysis may discern these states of n< 
activity. It may be safely assumed that the brain’s dynamical behaviour is well refle 
in EEG. Therefore, the nonlinear measures have found applications in the analysis 
interpretation of EEG and the seemingly random nature of EEG is thought to be di 
chaotic neuronal activities. Calculation of attractor dimension or correlation dimensioi 
dominated recent literature and has been used to characterise the effects of anaesth 
epileptic discharge and mental activity (Babloyantz & Destexhe 1986; Layne et al 1 
Mayer-Kress & Layne 1987; Watt & Hameroff 1987; Nan & Jinghua 1988). 

The dimension of the attractor is a characteristic feature of the underlying neur 
process generating the EEG signals. The dimension value of the attractor is of signific; 
in feature detection of various brain states, classification of patterns of neural activi 
differentiation of various types of neural activities and identification of specific druj 
fects on the brain. The attractor dimension directly reflects the degrees of freedor 
the system under study. Therefore, nonlinear dynamics provides a model for signal 
eration and temporal prediction which may help in determining the nature of neur 


rocesses governing the state of brain activity. Data on attractor dimension are available 
i literature from subjects in normal resting states, during seizures and in stages of sleep 
nd states of anaesthesia. Correlation dimension analysis is also available from cases with 
ttentional tasks in humans and experimental learning situations in rats. Human data on 
ttractor dimension in resting states have been rather less consistent with a range from 3 to 
ver 10. The dimensionality has been seen to drop during an epileptic epoch consistently, 
'he difficulties in comparing disparate results arise because of the algorithms used and 
le definition of dimensionality employed. Certain agreed upon uniform conventions may 
merge for handling experimental time series in future. However, using the parameter of 
ttractor dimension, a specific predictive model can be built and experimental verification 
f the model is possible. This newly gained insight on the chaotic dynamics of the brain 
> a significant departure from the earlier stochastic visualisation. It appears that nonlinear 
ynamics is going to be the method of study of complex systems and their experimental 
[me series. The application of nonlinear methods to EEG analysis has been discussed in 
etail in a previous paper (Pradhan & Narayana Dutt 1993). 

The overall interpretation of an EEG record is based on a qualitative impression about 
He changing patterns in EEG activity. The study utilises the singular value decomposition 
SVD) of a subset of phase space trajectory to evaluate the attractor dimension. The SVD 
aethod has been applied to model data from Henon and Lorenz maps. As EEG records 
re invariably noisy, various levels of noise have been added to the model data to evaluate 
he suitability of the method for its application to experimental time series like EEG. 
"he method has also been applied to real EEG data having'alpha, beta, theta, delta and 
^determinate activities for different data lengths (512, 1024.2048,4096, 8192, 16384 and 
:0480 points) to evaluate the data requirement of the SVD method for its application to 
£EG analysis. 


1. Extracting attractor dimension 

lie construction of a phase-space trajectory is a crucial step in nonlinear analysis of EEG 
ime series where the unidimensional voltage data are transformed to its trajectory in a mul- 
idimensional phase-space. Temporally experimental time series are single-dimensional 
lata. At any given instant of time it has only one phase variable. Before the turn of the 
:entury, Poincare showed that much can be learnt about dynamical behaviour from the 
inalysis of trajectories in a multidimensional phase-space in which a single point char- 
icterises the entire system at an instant of time. The experimentalist’s dilemma has been 
hat for a system with N degrees of freedom it seems necessary to measure N independent 
variables, an almost impossible chore for complex systems. The problem found a solution 
n the embedding theorem of Takens (Pradhan & Narayana Dutt 1993) that a multidimen¬ 
sional phase-space can be constructed from measurement of a single variable (like the 
electrical potential). For a time series V(f/), / =0, 1,2,3,..., TV, the phase-space vector 
i{ t) is constructed by assigning coordinates 

x ] (t) = V(t) 

x 2 (t) = V(t + T) 


x d (t) = V(t + (d-l)T), 


( 1 ) 
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where T is a delay time and d is the embedding dimension. The lag or delay T may 
be determined by the first zero of the corresponding autocorrelation function (Liebert & 
Schuster 1989). On construction of multidimensional phase-space vectors from a time 
series, the dynamical parameter, attractor dimension ( Dj ) or correlation dimension may 
be calculated. 

Dimension is perhaps the most basic property of an attractor (Farmer et al 1983). To 
characterise an attractor, one has to determine its dimension. One classical estimation of 
attractor dimension is provided by the algorithm of Grassberger & Procaccia (1983a, 
1983b) (GPA) which measures for each embedding dimension d, the number of couples 
of (xj, xj) whose distance is less than a given radius r. More precisely, the algorithm 
computes the correlation integral C{r) given by 

j N N 

c ( r ) = T72 £ £ 6{r-\xi-Xj\), (2) 

iV /=! 

where 8 is the Heaviside function. 

Grassberger & Procaccia (1983a, 1983b) showed that the correlation integral C(r) obeys 
the following scaling law: 

C(r) —> r D \ (3) 


Therefore, 


Dj = Iim 

r-» 0 


log C(r) 
logr 


(4) 


Here, C(r) is a measure of the probability that two arbitrary points jc ,- and xj of the phase 
will be separated by distance r. The main point is that C(r) behaves as a power of r for 
small r. Therefore plotting log C(r) versus log(r) allows us to calculate £>2 from the slopes 
of the curves. If the slopes of the graphs for increasing embedding dimensions converge to a 
saturation value, this limit is called the correlation dimension D 2 . For a stochastic process, 
there is no convergence and the slope keeps on increasing with increase in embedding 
dimension. 

The correlation dimension is introduced in information theory and is a generalization of 
the Hausdorff (or fractal) dimension Do - Moreover, it estimates the information dimension 
D\ when £>o > D\ > D2. GPA is one of the widely used method for determination of 
attractor dimension Grassberger & Procaccia (1983a, 1983b) where highend computing 
facilities are available. 

It can be seen that for data length of N, GPA needs a time of order A 2 and it is further 
enhanced if estimates are made for an entire range of embedding dimensions. An optimised 
method of estimating correlation dimension has been reported by Grassberger (1990). The 
GPA has also been modified by investigators for efficient computation of £>2 (Dvorak 1990; 
Theiler 1986). The GPA implemented on a Workstation (HP 9000/735) takes nearly 45 
minutes for calculation of the correlation integrals for embedding dimensions 3 to 12 for 
only 1024 data points. It takes several hours for 40000 data points on the same machine. 
Moreover, GPA requires large data points, 80000 or more, for a reliable estimation of 
correlation dimension. GPA fares poorly in presence of noise where the plot of log C(r) 
vs log(r) may not converge. Further, the scaling region for determining the slope (hence 
£> 2 ) is more often arbitrarily fixed and thus giving rise to varying results by different 
investigators. Therefore, we have investigated the SVD method which is computationally 
efficient and there is no arbitrariness of defining the scaling region. 



3. Singular value approach for estimation of attractor dimension 

The S VD method has been used for dimension estimation of chaotic attractors (Broomhead 
& King 1986; Passamante et al 1989). In this study, we describe a novel application of S VD 
for analysis of EEG signals. This approach is computationally efficient in comparison to 
standard GPA considering the massiveness of EEG data. The procedure of singular value 
approach is an extension of the idea of local intrinsic dimensionality introduced by Fulunga 
& Oslen (1971). The underlying idea is to make use of local linear approximation to the 
nonlinear evolution governing the dynamics. Given an embedding dimension d , a number 
of local centres M is selected randomly on the attractor. For each of these local centres x/ , 
the q nearest neighbours are retained and organised in a d x q matrix as 

X = (xi, x 2 , x 3 ,..., x q ). (5) 

The rank of X is determined using SVD. The significant singular values, or equivalently, 
the eigenvalues, are known to produce the dimensionality of the corresponding data space. 
The number of significant singular values then represents the local intrinsic dimension of 
the attractor at its local centre in phase-space. The singular value spectrum contains both 
the signal subspace and the noise subspace information. In the absence of noise, the limiting 
smallest value of singular values depends upon the number of nearest neighbours taken 
for construction of the matrix X around the local centre. Therefore, a certain minimum 
number of nearest neighbours are essential for obtaining significant singular values for 
estimation of the local intrinsic dimension of a local centre. Similarly adequate number of 
local centers are essential to cover the entire phase-space map for a reliable estimation of 
the attractor dimension (Pike 1987). 

The question of deciding which singular values are significant can also be a problem. 
The point where one places the threshold between signal and noise must be carefully 
chosen. The selection of the threshold is a matter of judgement which in some cases 
may be difficult. More objective methods for setting the threshold are being investigated. 
The difficult problem is the separation of the noise singular values from the signal singular 
values, particularly for low signal to noise ratios (SNR). The selection of the threshold will, 
in general, depend on the distribution or statistics of the noise singular values. However, at 
moderate SNR the number of significant singular values represent the dimension of signal 
subspace. The significant singular values may be obtained by using a threshold criterion 
of 30% of the maximum of the singular value spectrum. 

Once singular values have been estimated for various local centres on the attractor, the 
average of all local intrinsic dimensions is calculated as the attractor dimension. 

D 2 = P\S\ + P 2 S 2 + . • • + P r Sr , (6) 

where s\ = 1, 52 = 2, . . . , s r = r and r is the maximum number of significant 
singular values for M local centres. P z is the estimated probability of s; occurring i.e., 
Pi = hi/M with hi being the number of local regions that have dimension Sj, and M 
being the total number of local regions. There are certain advantages of the SVD approach. 
It is computationally less intensive and can be applied to both small and large data sets. 
However, the results are more accurate with 20000 or more data points. It is feasibly 
implemented on a Workstation and does not require expensive high computing platforms. 
A consistent criteria of determining the threshold for significant singular values may be 
applied instead of an arbitrary scaling region as in the case of GPA. In the same system 
it takes only 40 seconds to calculate the attractor dimension of 40000 points with an 
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embedding dimension of 30, local centres on the attractor M = 50 and number of ne 
neighbouring vectors at the local centre q = 50. The method gives a reliable estirm 
attractor dimension for noisy data. It has the potential of being implemented on-li 
EEG monitoring laboratories. 


4. Application of SVD approach to model data 

The well known Henon map and Lorenz map are classical examples of chaotic sysi 
The SVD approach is presented for the Lorenz and Henon attractors. The Lorenz s> 
is given by 


x = cy - cx, 
y = rx — y — xz, 
z = bz + xy, 

where c = 10, b = 8/3, r = 28. Gills routine was used for integration with a step 
of 0.006. The initial conditions x(0) = y(0) = z(0) = 1 were used for the Lorenz 
200000 data points were generated of which initial 2000 points were discarded to rei 
the initial transients. 

The Henon map is given by 

jc/+i = 1 — ax- +bxi~ i, 



Figure 1. Time plot of Lorenz map. White Gaussian noise has been added to 2048 
data points (a) infinite, (b) 30dB, (c) 20dB, (d) lOdB and (e) OdB SNR. 



Figure 2. Phase-space plots of Lorenz map with additive noise (a) infinite, (b) 30dB, 

(c) 20dB, (d) lOdB and (e) OdB SNR. 

where a = 1.4 and b = 0.3. Data were generated using the initial conditions jc/ = x;__i = 0 
for the Henon map data consisting of 900000 points. The initial 2000 data points were 
discarded to avoid transients at the beginning of the data. 

We have used varying lengths of data segments (512, 1024, 2048, 4096, 8192, 16384 
and 20480). With an embedding dimension of 30, phase space vectors were created. The 
number of local centres M = 50 were randomly chosen on the attractor. We obtained 
50 nearest neighbouring points (q = 50) around each local centre. The matrix X was 
generated using the nearest neighbour points for a given local centre using (5). The SVD 
of X gave the singular values. Using the threshold criteria of 30% of the maximum singular 
value, the number of significant singular values were determined for a given local centre. 
The attractor dimension was determined as per (6). The estimations have been carried 
out with various levels of additive noises (infinite, OdB, lOdB, 20dB and 30dB) to the 
Henon and Lorenz maps. A Gaussian random sequence of zero mean and unit variance 
was generated and added to the model data for various levels of SNR. 

SNR(dB) = 10 log [~ - ? ow —1. (9) 

L noise power _ 

The sample data segments and their phase space plots for Henon and Lorenz maps with 
additive noise are given in figures 1-4. The saturation of dimension estimate with increasing 
number of local centres for the Lorenz map has been given in figure 5. It shows that £>2 
value may be reliably estimated by fixing M and q at 50. 

The results of SVD method for model data have been summarised in tables 1 and 2 
which depict the relationships of data lengths, noise levels and number of local centres with 
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Figure 3. Time plot of Henon ma 
White Gaussian noise has been addc 
to 2048 data points (a) infinite (b) 30d 
(c) 20dB (d) lOdB and (e) OdB. 


attractor dimensions. The values are the averages of dimensions calculated on conseci 
data segments. 


5. Application of SVD approach to EEG data 

Oscillations in single neuron and neuronal ensembles underlie the generation of vai 
pattern features in EEG. Since mathematicians have long known that the periodic for 
of nonlinear oscillators can give rise to complex phase-locking patterns, bifurcations 
aperiodic dynamics (Hayashi et al 1982), one anticipates that such behaviour migl 
observable in forced neuronal oscillators. Studies on the periodic forcing of bioloj 
oscillators have, in fact, been interpreted in the context of chaotic dynamics (Ha> 
et al 1982). It has been proposed that complex EEG patterns which occur normally ; 
from interactions between a large number of neural relaxation oscillators (Baser IS 
All these observations raise the possibility that some of the observed variability in n< 
electrical activity may be a reflection of intrinsically chaotic dynamics. Thus the con 
of chaos introduces a perspective for the analysis of neural dynamics and this has beei 
motivation for the present study. EEG, being a complex pattern generated in the brain 
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Figure 4. Phase-space plots of Henon 
map with additive noise (a) infinite, 
(b) 30dB, (c) 20dB, (d) lOdB and 
(e) OdB SNR. 


possible chaotic process seemingly veiy irregular and random-like in character, is in fact 
a good choice for application of chaotic dynamics. Estimation of attractor dimension is 
a fundamental measurement for characterising chaotic systems. Keeping this in view, we 
have applied the SVD method for estimating the attractor dimensions of changing patterns 
in EEG. 


5.1 Recording and digitisation of EEG 

The EEG signals from 8 volunteers were recorded with Nihon Kohden EEG amplifiers. 
The four channels of unipolar EEG, Fj, (chl-1), F 4 (chl-2), Oi (chl-3), 02(chl-4) referenced 
to A 2 were obtained for durations varying from 10 to 15 minutes. The subjects were 
instructed to close their eyes for some time during the data acquisition period and allowed 
to sleep. The signals were digitised at 128 samples/second/channel. Data acquisition was 
accomplished with the use of 12-bit DT-2841 ADC coupled with DT-7020 array processor 
(Data Translation Inc, MA, USA) in a PC-AT computer. Data were then transferred over 
a network (off-line) to HP9000/735 Graphics Workstation for further analysis. The raw 
EEG signals were filtered through a bandpass (0.25-32 Hz) 4th order Butterworth filter 
twice cascaded. The data were scanned for a specific activity and 25600 consecutive points 
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Convergence of D2 with Local Centres s 



Figure 5. Saturation curve of D2 with increasing number of local centres ( M ) for 
Lorenz map (N = 4096, q = 50). 


having a given pattern of EEG activity were extracted. The extracted EEG segments were 
then processed for quantification of attractor dimension. The time series plots of various 
EEG activities are shown in figure 6. 

5.2 Estimation of attractor dimension of EEG data 

The dynamical behaviour of the brain is currently being viewed in the perspective of 
nonlinear dynamics. There are several reports of low dimensional chaotic activity in various 
states of human behaviour (Baser 1980). While dimension estimate provides a measure 
for classifying various brain activities, the Lyapunov exponent may be seen as a powerful 
estimate of dynamics of the system reflecting the long-time average exponential rates of 
divergence or convergence of nearby trajectories in state space. If a system has at least 
one positive Lyapunov exponent, then the system is chaotic. It stems from the premise 
(conjecture) that chaotic systems are highly sensitive to initial conditions. It implies that 
small changes in the state of a chaotic system grow exponentially and dominate the system 
behaviour (Wolf et al 1985). Since the error bar is composed of many adjacent states in the 
solution space or phase space and adjacent states diverge quickly, it follows that error bars 






c) 


(d) 


e) 

Figure 6. Time plot of (a) alpha, (b) beta, (c) theta, (d) delta and (e) indeterminate 
EEG activity of 8s duration. 

on the initial conditions of chaotic systems grow exponentially fast. Error bars on initial 
conditions are omnipresent, so it may be concluded that long-term prediction of chaotic 
systems are futile no matter how the system prediction is implemented. The dominant 
Lyapunov exponent (A. i) has also been evaluated by application of the algorithm of Wolf 
et al (1985). The Ai values of different EEG activities are seen to be positive. The value 
of A-i is 0.143 ±0.01 for alpha activity. For beta activity the X \ value is 1.801 ±0.10and 
for theta, delta and indeterminate activities the values are 0.162 ± 0.13,0.135 ±0.01, and 
1.102 ± 0.01 respectively. It implies that EEG is chaotic. 

The attractor dimension for varying lengths of data (512,1024, 2048,4096, 8192, 16384 
and 20480 points) obtained for alpha, beta, theta, delta and indeterminate activities is given 
in table 3. The phase-space plots of various EEG activities have been presented in figure 7. 
The number of nearest neighbours q = 50 has been used in computing the dimension 
for local centres M = 50, M = 100 and M = 200 as in the earlier case. The maximum 
embedding dimension is 30. From the results of Lorenz and Henon maps, it is evident 
that 4096 to 8192 data points are the optimum data length covered by M = 50 local 
centres. Here we have used 4096 data points as optimum length for estimation of attractor 
dimension. 






6 . Results and discussion 

The analysis of EEG in the past several decades has been attempted by the phenomeno¬ 
logical approach, in which the EEG is seen to be a band-limited signal produced by some 
black-box with unknown or white Gaussian noise input. More recently a model-based 
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Figure 7. Phase-space plots of (a) al¬ 
pha, (b) beta, (c) theta, (d) delta and 
(e) indeterminate EEG activities. 


approach, incorporating concepts developed in the area of nonlinear dynamics and cl 
theory, has been used. The application of nonlinear dynamics to EEG analysis may pro 
information to understand the underlying neurodynamics of EEG generation and its i 
lution. Estimation of the attractor dimension is a primary step in this direction. The ( 
has been in the forefront of the computational procedures for obtaining the dimensio 
the attractor. For reasons already mentioned, an alternate method suitable for EEG 
is called for. We have described a method of application of singular value spectrum 
estimating the dimension of the attractor. It was essential to apply the method for n* 
data for determining data lengths suitable for analysis. 

We have presented the SVD method for estimation of the attractor dimension of m< 
data from which we extract the information about the appropriate data length requiren 
for a given number of local centres ( M ) and number of nearest neighbours (q). The tl 
parameters M, q and N influence the dimension estimate. Therefore, for an optimal e 
uation of the dimension, two of the parameters may have to be fixed while the third or 
varied. It could be seen that the method is suitable for small data and a suitable estin 
of dimension may be obtained with N — 4096-8192 while M and q are fixed at 50. 

Without additive noise, 512 data points of the Lorenz map yields an attractor dimens 
of 2.443 which is higher than the theoretically expected dimension (2.01). The expet 
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Table 1. Attractor dimension values for Lorenz map. 


SNR 

Data length 

Attractor dimension 


II 

Lh 

O 

(M = 100) (M 

= 200) 

oo 

512 

2.443 

2.417 

2.498 


1024 

2.510 

2.558 

2.579 


2048 

2.257 

2.174 

2.156 


4096 

2.033 

1.860 

1.879 


8192 

1.783 

1.810 

1.873 


16384 

1.520 

1.470 

1.580 


20480 

1.435 

1.440 

1.435 

30 dB 

512 

2.442 

2.506 

2.494 


1024 

2.514 

2.560 

2.602 


2048 

2.272 

2.126 

2.141 


4096 

2.029 

1.900 

1.896 


8192 

1.774 

1.830 

1.870 


16384 

1.518 

1.520 

1.525 


20480 

1.433 

1.460 

1.455 

20 dB 

512 

2.519 

2.576 

2.528 


1024 

2.518 

2.625 

2.597 


2048 

2.282 

2.164 

2.138 


4096 

2.019 

1.932 

1.877 


8192 

1.767 

1.705 

1.790 


16384 

1.501 

1.560 

1.510 


20480 

1.411 

1.438 

1.420 

10 dB 

512 

2.875 

2.744 

2.777 


1024 

2.772 

2.881 

2.918 


2048 

2.510 

2.294 

2.299 


4096 

2.179 

2.066 

2.064 


8192 

1.842 

1.965 

1.858 


16384 

1.486 

1.580 

1.558 


20480 

1.374 

1.430 

1.439 

0 dB 

512 

8.260 

8.136 

8.133 


1024 

7.666 

7.979 

7.969 


2048 

6.883 

6.629 

6.676 


4096 

5.423 

5.424 

5.328 


8192 

3.842 

4.165 

3.948 


16384 

2.449 

2.360 

2.435 


20480 

2.059 

1.992 

2.025 


value could be reached when the data points are about 4096. With increase in the number 
of points beyond 4096, the number of local centres (M — 50) are not adequate for covering 
the entire phase-space; therefore there is a drop in the value of the dimension. When 50 
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Table 2. Attractor dimension values for Henon map. 


SNR 

Data length 

Attractor dimension 


(M = 50) 

ii 

o 

o 

= 200) 

CO 

512 

6.214 

6.222 

6.232 


1024 

4.318 

4.314 

4.336 


2048 

2.805 

2.818 

2.839 


4096 

1.787 

1.768 

1.776 


8192 

1.254 

1.242 

1.268 


16384 

1.077 

1.075 

1.093 


20480 

1.056 

1.050 

1.040 

30 dB 

512 

6.220 

6.210 

6.210 


1024 

4.319 

4.312 

4.274 


2048 

2.809 

2.798 

2.785 


4096 

1.785 

1.750 

1.803 


8192 

1.254 

1.265 

1.248 


16384 

1.078 

1.070 

1.105 


20480 

1.050 

1.020 

1.032 

20 dB 

512 

6.291 

6.279 

6.260 


1024 

4.373 

4.373 

4.359 


2048 

2.836 

2.788 

2.857 


4096 

1.795 

,1.760 

1.796 


8192 

1.255 

1.270 

1.243 


16384 

1.075 

1.090 

1.080 


20480 

1.048 

1.060 

1.060 

10 dB 

512 

6.880 

6.873 

6.864 


1024 

4.865 

4.852 

4.842 


2048 

3.198 

3.197 

3.193 


4096 

1.993 

2.026 

1.996 


8192 

1.328 

1.290 

1.347 


16384 

1.094 

1.080 

1.100 


20480 

1.061 

1.060 

1.062 

OdB 

512 

9.321 

9.340 

9.348 


1024 

8.000 

8.024 

8.024 


2048 

6.409 

6.436 

6.479 


4096 

4.790 

4.764 

4.838 


8192 

3.274 

3.335 

3.285 


16384 

2.075 

1.970 

1.940 


20480 

1.779 

1.720 

1.715 


local centres are used on small data sets, there may be overlap among the attractor zones 
thereby producing a higher estimate of the dimension. With addition of OdB noise the initial 
estimate of dimension is 8.2604 for 512 points and it is 5.423 for 4096 points. The phase- 
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Table 3. Attractor dimension values for different EEG activities. 


EEG 

Data length 

Attractor dimension 


/'“'X 

II 

Ut 

o 

£ 

✓ - \ 

o 

o 

II 

£ 

= 200) 

Delta 

512 

4.642 

4.641 

4.637 


1024 

4.605 

4.610 

4.623 


2048 

4.606 

4.555 

4.575 


4096 

4.416 

4.430 

4.420 


8192 

4.490 

4.210 

4.275 


16384 

3.973 

3.993 

3.998 


20480 

3.897 

3.923 

3.922 

Theta 

512 

7.568 

7.585 

7.620 


1024 

7.387 

7.332 

7.322 


2048 

7.062 

7.201 

7.158 


4096 

6.620 

6.685 

6.770 


8192 

6.216 

6.326 

6.376 


16384 

5.770 

5.880 

5.985 


20480 

5.710 

5.785 

5.800 

Alpha 

512 

10.570 

10.587 

10.747 


1024 

10.066 

10.184 

10.050 


2048 

8.382 

8.589 

8.664 


4096 

7.320 

7.398 

7.370 


8192 

6.391 

6.401 

6.411 


16384 

5.880 

5.923 

5.921 


20480 

4.921 

5.103 

5.213 

Beta 

512 

11.491 

11.679 

11.689 


1024 

11.030 

11.030 

11.050 


2048 

10.060 

9.997 

10.010 


4096 

9.340 

9.270 

9.340 


8192 

8.570 

8.890 

8.610 


16384 

8.300 

7.900 

7.860 


20480 

7.600 

7.680 

7.820 

Indeterminate 

512 

8.616 

8.664 

8.638 


1024 

8.970 

8.960 

8.990 


2048 

8.540 

8.570 

8.582 


4096 

8.050 

7.970 

7.920 


8192 

7.290 

7.290 

7.387 


16384 

6.920 

6.920 

6.970 


20480 

6.820 

6.440 

6.660 


space plots (figures 2 and 4) maintain the characteristic forms of maps upto lOdB SNR. 
The saddle-shaped appearance of Henon map is apparent at lOdB SNR. At OdB SNR the 
structures of the maps are lost. Therefore a spurious estimate of dimension is encountered 
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at OdB noise as the noise invades the entire phase-space. Such fallacious estimates can 
be seen in table 1 where the dimension value is 2.059 at 20480 data points; this almost 
approximates the expected value. It is implied that as the number of data points increase, 
the number of local centres need to be increased. The estimations have been repeated for 
100 and 200 local centres on the attractor. It is apparent that a choice of 50 local centres 
suffice only for 4096 points. Our experiments indicate that SVD is able to approach the 
expected value with 30 dB noise. Even at 10 dB noise level, the dimension values for 4096 
points is 2.17 which is not much deviated from the expected value. Similar results are 
seen for the Henon map in table 2. Here data lengths of 8192 points yield values that are 
close to the expected values. The use of SVD in differentiating the signal space and noise 
space is known in signal processing. At moderate SNR (upto lOdB) the SVD approach 
differentiates the signal subspace and the noise sub-space by significant differences in 
their eigenvalues. Therefore the SVD approach may be,suitable for signals like EEG. The 
information is utilised to evaluate the dimension of EEG data. 

Different patterns of EEG activities have been analysed with the application of SVD 
method. We have presented the analysis results for alpha, beta, theta, delta and indetermi¬ 
nate activities. Varying lengths of data segments (512,1024,2048,4096, 8192, 16384 and 
20480) have been used to see whether our primary assumption of determining the record 
length from the model data is valid. It is seen that EEG results fall into a similar pattern 
of chaotic signals of Lorenz and Henon maps. It can be seen that the attractor dimension 
values are high for small data sets (512) and decrease with increase in the length of data 
similar to Lorenz and Henon maps. For delta, the values of the dimension range from 4.641 
to 3.897 for 512 to 20480 data points with 50 to 200 local centres. Using the data length 
criteria from Lorenz and Henon maps, the dimension value may be fixed at 4.416 for 4096 
points and 50 local centres. The attractor dimension thus determined by the SVD approach 
for delta activity is 4.416 which is similar to the range reported in the literature. Alpha 
activity has a value of 7.32 for 4096 data points. The attractor dimensions for beta, theta 
and indeterminate EEG activities have values of 9.93,6.62 and 8.05 respectively. Different 
patterns may be discerned by their attractor dimension values. It may be seen (figure 6) that 
beta and indeterminate activities are more random-like whereas the delta, alpha and theta 
patterns of EEG tend to become periodic. This is more apparent from the phase-space plots 
(figure 7). Limit-cycle-like behaviour of delta and theta may be seen in the plot. The degree 
of complexity of the signal may be qualitatively inferred from its phase-space trajectory. 
There occurs a loss of complexity of EEG as one moves from a relaxed state of alpha to a 
state of deep sleep predominated by delta. Such loss of complexity of EEG may also be 
encountered in seizure discharges and degenerative conditions of the brain. Neurobiologi- 
cal significance of the low and high values of attractor dimension which reflects the degree 
of complexity needs to be determined by more empirical observations in different brain 
conditions. This paper only reflects the need for a suitable method that could be computa¬ 
tionally efficient for any real-time or on-line application in neurobiological investigations 
of the brain. We have only presented the data of the known patterns of EEG activity to 
show that the SVD method can be reliably applied to EEG. These estimates are within the 
acceptable ranges of the dimension values in keeping with the degree of complexity of the 
signals. The SVD method for obtaining attractor dimensions to different EEG activities 
presented in this study may have potential applications in the feature detection of EEG 
patterns. It may also help in understanding the underlying dynamics of neuronal processes 
in the generation of EEG. 


An SVD method for estimation of EEG 
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7. Conclusions 

The present study suggests that attractor dimension may be an effective method of deter¬ 
mining the degree of complexity of the data and may offer a way for feature detection in 
EEG. SVD method is effective with small data sets at moderate SNR. Hence, it may be con¬ 
sidered as a preferred method of determining the attractor dimension of EEG time series. 
The complex, disorganised and random-like patterns in EEG such as beta and indeterminate 
activities have higher dimension values and thus they may reflect a greater degree of com¬ 
plexity in their processes. However, the neurobiological significance of high-dimensional 
and low-dimensional values in different brain states are to be determined. Further studies 
are needed to establish the correlations of attractor dimension values to those of physi¬ 
ological functions. The nature of attractors of neuronal systems may be of great help in 
future for understanding the dynamical properties of brain. In this study, we have seen that 
complex patterns and more random-like EEG activities have higher attractor dimensions. 
The alpha, delta and theta patterns that are quasi-periodic have lower dimension values 
than beta and indeterminate activities. The crucial role of length of data with respect to 
number of local centres in the calculation of attractor dimension has been emphasised. 

The chaotic model of EEG generation i.e. the notion that simple, nonlinear systems can 
produce complex, almost random looking outputs is very appealing. The fact that EEG 
appears unpredictable, yet is bounded to a limited frequency and amplitude range with 
a few basic rhythms and waveforms points to the chaotic aspect of the brain’s electrical 
activity. A study of the analysis of EEG has been made here keeping in mind the nonlinear 
deterministic nature of EEG. 
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Morphological processing of multichannel images 

BALVINDER SINGH and M U SIDDIQI 

Department of Electrical Engineering, Indian Institute of 
Technology, Kanpur 208016, India 

Abstract. A theory for morphological processing of multichannel images 
within the abstract framework of lattice theory is presented. The theory makes 
use of marginal ordering and reduced ordering schemes to arrive at multivari¬ 
ate morphological operators. This paper elucidates the algebraic structure and 
properties of mathematical multivariate morphology. It is shown that matrix 
morphology formulation is a natural consequence of marginal ordering and 
serves as a technique for processing of multichannel images. Further, the con¬ 
cepts of quasi-ordering and complete quasi-lattices are introduced to define 
morphological operators utilizing the reduced ordering scheme. 

Keywords. Multichannel images; mathematical multivariate morphology; 
multivariate ordering; lattice theory; matrix operators; operators over quasi¬ 
lattices. 


I. Introduction 

Mathematical morphology is a theory which is concerned with the processing and analysis 
of topological and geometrical aspects of an image. It initially originated from a set 
theoretical formulation for processing of binary images (Matheron 1975; Serra 1982). 
Subsequently, these methods were extended for processing of gray scale images (Sternberg 
1986). However, it was soon realized by Serra (1988), Ronse(1990), Heijmans & Ronse 
(1990) and Ronse & Heijmans (1991) that mathematical morphology can be generalized 
within the abstract framework of complete lattices. An excellent treatment of the recent 
developments in mathematical morphology is available in the text by Heijmans (1994). 

There are many varied and important applications such as processing of colour images, 
multispectral image analysis, biomedical imaging, robot vision, industrial inspection etc. 
which require multichannel image processing. In these applications, the use of image data 
from multiple frequency bands, multiple time frames, multiple colours or multiple sensors 
(e.g. optical, radar, range etc.) is of tremendous use. For such situations, the information 
is available in the form of multivariate data which should be processed so as to take into 
account the interrelationship between the individual variates (image frames). A formulation 
based on 3-dimensional geometrical structure of image sequences is presented by Cheng 
& Venetsanopoulos (1992). Another attempt has been to decorrelate the various signal 
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components by means of a suitable linear transformation and then apply the morphol 
operators separately (Eo 1992). 

We approach this problem by extending univariate morphology to multivariate me 
ogy by the use of ordering principles for multivariate data. Several multivariate or 
techniques have been proposed and discussed (Barnett 1976). However, only m; 
ordering and reduced ordering schemes are of interest to us. In marginal orderi 
components of the multivariate data are ordered componentwise. On the other h; 
reduced ordering, the multivariate data are mapped to completely ordered univaria 
by using some distance criteria and then the ordering is performed on the basis o 
univariate data. These ordering principles have been applied to develop nonlinear 
variate order statistic and vector median filters (Astola et al 1990; Hardie & Arce 
Pitas & Tsakalides 1991; Trahanias & Venetsanopoulos 1993). During the prep: 
stages of this paper, the authors became aware of the work by Goutisias etal (1994) 
work deals with a lattice theory approach to multichannel morphological image proc 
(termed vector morphology by Goutisias et al (1994)) and consists of examples d 
strating its applications. This paper presents parallel theoretical results and supple 
the work of Goutisias et al (1994). Due to limited space, we will not discuss any 
background theory and no proofs of the results presented will be given. However 
overview, readers may refer to details of lattice theory in texts by Birkhoff (1973), 
(1963) and Gierz et al (1980) and mathematical morphology in texts by Heijmans 
and Serra (1982, 1988). Readers interested in the details of the work presented he: 
refer to Singh & Siddiqi (1994). 

This paper is organized as follows. Section 2 gives a rigorous treatment to multi 
morphology over general complete lattices within the marginal ordering scheme. Se 
deals with translation invariant multivariate morphological operators and their proj 
Section 4 addresses the problem of morphological operators utilizing the reduced oi 
techniques. The conclusions are presented in §5. 


2. Multivariate morphological operators 

2.1 Structure of multivariate signal 

Let us consider a multichannel image (which is a multivariate signal) that is sped 
an indexed set of image frames. Let 1 represent such an indexing set. For the disc 
to follow, we have the set 1 = {1,2,..., m}. The range set for the ith image cha 
denoted by Gi for i € T. Then the cartesian product set 

G = Y\Gi = 01 xC? 2 X...xS m 

iel 

= {G = [gi, g2, . .., g m ]\gi 6 Qfii 6 1} 

is the range set for the multichannel image. Let a partial order relation < be define 
each of the sets Gi ■ 

DEFINITION 1 

The cartesian product set Q is a poset under the relation <; which is defined for all F, 
as F x G if and only if ft < gi for /,-, gi e Gi and i e I. 

A (7-valued image G on E is a member of £ = G E and is a map of the fi 
G: Eh> Q. 


lnuryiLViugiL.u,i uj l irnugco 


*■+1 


G = {G(ac) = [g ,(*), g 2 (x),..., g m (x)]\gi{x) € QiWi e J, x e E). (2) 

Hence the set £ of all multivariate functions over the domain space E and range space Q 
represent multichanfiel images and is as follows: 

c = g e = (y\q) E = U^ = Yl^ w 

Vex / iei /ex 

where £,• = Qf. 

We will be dealing specifically with sets Gi that are complete lattices (in particular, 
complete chains). Therefore, £,• is also a complete lattice (also complete chain). We will 

henceforth call £,• as component lattice as they are members of the cartesian product set. 

The following theorem (Birkhoff 1973) establishes the structure of the object space (i.e., 
multichannel image under consideration). It should be noted that the manner in which the 
supremum and infimum operations are defined, it essentially induces marginal ordering 
on the multichannel image. 

Theorem 1. The direct product 2 = (£, U, n) of component lattices Hi = (£;, v, a) for 
i G I is a complete lattice where for all F, G 6 £ = Wi&x A we have: 

F U G = [f\ V g], f 2 V g 2 , . . ., f m V gmY, 

F n G = [f\ A gu f 2 A g 2 , . .., frn A gml 

COROLLARY 1. 

The direct product 2 = (£, U, n) of component chains 2,- = (£,-, V, a) for i 6 1 is a 
complete chain. 

COROLLARY 2. 

The complete chain 2 = (£, Li, n) is a completely distributive lattice. 

2.2 Matrix operators 

Having established the basic algebraic structure of a multichannel image (that of a complete 
lattice), we proceed to characterize the structure and properties of mappings between 
such complete lattices. Therefore for the discussion to follow in this section, we will be 
concerned with complete lattices 2 a , 2/,, 2 C with the corresponding cartesian product 
sets C a = flisi A,/; A = Yljej F-bj'i A = ritex: A,*» and J, J, K, the indexing sets 
for the component sets L a ,i, A,/, A,*. Here J = {1, 2,..., m}; J = {1,2,...,«} and 
/C = {1,2,In order to prevent any undue increase in notational complexity, we 
will be using the same set of symbols for operations for the three lattices defined above. 
We will use < (<) for order relation, u( v) for the supremum and i~i(a) for the infimum in 
direct product lattices (component lattices). The least and the greatest element in A are 

O a = [O a , i, O a , 2 , • • •, O a , m ] and l a = [I aA , / a>2 . I a , m ] respectively (with O a j and 

I a ,i as the least and the greatest elements in C a ,i for all i e J). In a similar fashion, the 
least and the greatest elements in A and £ c are defined. 

Let M a t-*b = re P resent the set of all mappings from A A - It should be noticed 
here, that the elements of M a ^b are matrix operators with each entity (element) of the 
matrix being an operator between component lattices. Let W e M a ^b be an m x n matrix 
operator with 'b = i el, j € J] and xfrij € £^°j', that is fij : £ a j i-> 4,/- It is 
worth noting that if for any e M at -*b, then 4> T e Mbt-+ a - 
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DEFINITION 2 

The order relation < on elements of M a ^b is defined for all <F, *F e M a ^b, as 4 
if and only if <p,j < fij for all i e 1 and j € J. 

As a result of Cb,j for all j e J being a complete lattice, the set of all mappings 
all i € J and j e J has the structure of a complete lattice. This in turn induces a cor 
lattice structure to the set M a ^b of matrix operators. We shall denote supremui 
infimum in the set M a ^b by u and n respectively. Here (M a ^b, <) has least elerr 
and the greatest element T defined as below. 

DEFINITION 3 

Matrix constant operators are defined as: 

r = [yij\i elj e J], where Yij(fi) = h,j\ 

3 = [£,j; i el, j e J], where = O hJ . 

Hence for any F e C a , wehaveT(F) = lb and 3(F) = Ob- We next give the defi 
of how a matrix operator from M a ^b operates on the elements of the set C a . 

DEFINITION 4 

For 'I' € Mav+b, F e C a and G e C b , we define: 

(a) 'F(F) = f F v 'P = G where gj = \J ieX fijifi) for all j e J\ 

(b) ¥(F) = F A * = G where gj = A, ex i'ij(fi) for all j e J. 

In the sequel, whenever we use the notation >F (F), then it would imply that we are reft 
to both kinds of matrix operations as defined above. 

PROPOSITION 1. 

For <F, *F € May-*b an d F e C a , we have: 

<F < 4> =► <F(F) < 'F(F). 


Thus for any M C M a ^*b and F € C a , it is a straightforward to show the following 
F V ( U Af) = □ (F V'P) 

'I'eA/ 

F A ( L)M) > [J (FA'F) 

F v ( (“1 M ) * n V 'P) 

FA( p|Af) = (-] (F A'F). 


<i 


Based on the previous discussion, we define three different types of composition bet 
matrix operators. 


DEFINITION 5 

Matrix operators can be composed in the following way: 

(a) For <F, <t> € M a ^b, we define 

0F<t>) = 4> o V = i eljej] e M a » b - 






) For <F € M a t-±bi 'F € Mb<-+c, we define 


0F<&) = <I> v xl/ = 


\/ .ktpi j)'-: i £ I, k (= K, 

jej 


£ M a t-+c> 


) For <F € Ma\-*bi *F e Mb<-> c > we define 


0F4>) = $A$ = 


f\ ( 'J r j,k < t > ij)'i i €l,k € 1C 

}ZJ 


€ -M-av+c- 


Here it should be noticed that for F e C a ; ('Fd>)(F) is not necessarily the same as 
(<F(F)). Further, the process of matrix operator composition is possible only between 
mpatible operators (i.e., matrix operators with compatible dimensions for composition), 
ms for definition 5(a); *F, d> and d> o 'F are m x n matrices and for definition 5(b,c), d> is 
x n; *F is n x p and (vF d>) (i.e., <t> v 'F and 4> A ❖) is m x p. It is evident that composition 
matrix operators is associative but not commutative. Further, it can be shown as a result 
the ordering of matrix operators that for all d>, *F, 0 e M a \-+b such that d> < »F implies 
o d> < © o »F. Similarly, for 0 e M a ^b and for all <F, *F € Mb^c such that <E> < *F 
plies that © v d> < © v 'F and © A <F < © A 'F which is equivalent to stating that 
»©) < (d/©). For a general case, with *F e M a ^b and M C M a \-+b we have: 

<F o ( |j M ) = □ ('F o <J>) 

<t>GM /<r\ 

o ( n m) = [~] OF o d>). 

<s>eM 

l the other hand, for any *F e M a n*b and M C Mbv+ C , we have: 

'F V ( \J M ) 

A ( (J M) 

d/ v ( n M ) 

^ a ^ n M ) 


= U ('F v ®) 

<D eM 

t □ ('F A <F) 

<t>€M 

a n v <*>) 

<t>eM 

= f-| (* A 4>). 

<t>€M 


( 6 ) 


5 Matrix increasing operators 


e next describe matrix increasing operators and show that matrix dilation operator and 
itrix erosion operator are examples of increasing operators. Further matrix dilation and 
itrix erosion are dual operators in a lattice theoretic sense. 

iFINITION 6 

ir any <J> € M a ^b, we define the following: 

) The operator d> is a matrix increasing operator if and only if <pij for all i el, j € J 
are increasing operators. 

) The operator $ is a matrix dilation operator if and only if faj for all i e I, j e J are 
dilation operators and the matrix operator operates according to the definition 4(a). 
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(c) The operator <J> is a matrix erosion operator if and only if faj for all i el, j e 
erosion operators and the matrix operator operates according to the definition 

Matrix dilation operators will be denoted by V with each entry as for all i e 
j e J. Matrix erosion operators will be denoted by £ with each entry as £, j for a] 
and j 6 J. 

PROPOSITION 2. 

For any 0 £ M a ^b< suc ^ l ^ at ir 15 a matrix increasing operator, then the foi 
statements hold and are equivalent. 

(a) For every F,G e C a such that F <G implies that 0(F) < 0(G). 

(b) For any S C C a , we have 0( \js)h ,|J 0(F). 

FeS 

(c) For any S C C a , we have 0^ [~~| Sj ;< f~| 0(F). 

FeS 


PROPOSITION 3. 

(a) Let V e Mah*b be a matrix dilation operator, then for any S C C a , the fai 
holds: 


( [_jS) = |J(FvT > )- 

FeS 

(b) Let £ e M a ^b be a matrix erosion operator, then for any S C £ a , the fo 
holds: 


( f\s )as = 

FeS 

It is evident that as Sij and &ij for all i e 1, j e J are increasing operators, 
erosion operator £ and matrix dilation operator V are matrix increasing operators. 

The following facts follow as a consequence of the above discussion and are pr 
of operator composition analogous to (5) and (6). 

• Let © be a matrix increasing operator, then for any 0, 0 € M a ^b such that 
implies (©0) ■< (©0). 

• Let © be a matrix dilation operator and M c M a ^-b, then 

(©( u M ))= U (©*) 

VeM 

(©( n M ))^ n ( 0 ^- 

x >1 <eM 

• Let © be a matrix erosion operator and M c M a ^b, then 
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(e(n«))= nfe*)' 

Here it should be kept in mind that operator composition results given above are valid for 
appropriate dimension of the operator 0, i.e., © e M a v+b for composition operator o and 
© € Mb\~*c for composition operator v or A. 

PROPOSITION 4. 

The set of matrix increasing operators from C a h-> Cb is closed under composition by o 
and is a complete sublattice of M a \-+b* 

Since matrix dilation and matrix erosion are increasing operators, therefore they must 
satisfy results analogous to proposition 4. This is stated in the following proposition. 

PROPOSITION 5. 

(a) The set of matrix dilation operators from C a *-> Cb is closed under composition by o 
and is a sup-closed subset ofMa^b* 

(b) The set of matrix erosion operators from C a k~> Cb is closed under composition by o 
and is a inf-closed subset ofM a h+b- 

As a consequence, the set consisting of matrix dilation and matrix erosion operators is a 
complete lattice. The least matrix dilation operator is S and supremum of dilations is U. 
Dually, the greatest matrix erosion operator is T and infimum of erosions is n. 

PROPOSITION 6. 

Let V e M a \-+bi £ £ Mbv-> a be matrix increasing operators. If the pair (£, V) is an 
adjunction (i.e. F xj V < G 4=^ F < GAS), then V is an m x n matrix dilation 
operator and £ is an n x m matrix erosion operator. 

The above proposition relates to every erosion operator £, a corresponding dilation operator 
V and vice versa. This pair (£ , V) is called an adjunction with S being the upper adjoint 
and V the lower adjoint. 

In the proposition to follow, we have: A b = [Xj? j 2 ; j\ , 72 € J] where X^ ^ is equal to 

X hJ\ for h = b and Yj uh for ii ^ h such that X b juh (f h ) = f h and = h,j 2 - 

Similarly, we have A a = [A? i2 ; i i, h e I] where A? ( - 2 is equal to A? ^ for i\ = 12 and 
tf u i 2 for <1 # h such that A"’,^ (/,-,) = and £? , 2 (f h ) = O a j 2 . Hence F A A b = F 
for F € Cb and F v A a = F for F e £ a . 

PROPOSITION 7. 

For all V e M a *+b, £ e Mbv+a matrix increasing operators such that the pair (£, V) is 
an adjunction, then the following hold: 

(a) V AS > A a 

(b) £x/V < Ab 

(c) S\/V AS = £ 

(d) V A S V V = V 

PROPOSITION 8. 

(a) Given that (£ 1 , V\) and (£ 2 , Vf) are adjunctions then S\ > £2 if and only ifV\ < T> 2 - 



(b) Given that the pairs (£' l \ U <!> ) are adjunctions for all t € T, then (n re - 
U (6T T>^) is also an adjunction. 

(c) Given a pair of matrix dilation operators T>\,T >2 € M. av ^ t, and a pair of matrix e> 
operators £\,£i e Mb*+a> suc ^ that the pairs (£\ , V\ ) and (£2 , Vf) are adjunc 
then {£\ A £ 2 , X>2 V X>\) is an adjunction. 

DEFINITION 7 

A projection map Vi : £ h* A for all i e 1 from a direct product complete 1 
£ to the ith component £,- is defined as Vi(F) = f for all i e X with F = 
fl, • • • 1 fm\ € £ and ft e £i. 

The projection map or operator plays an important role in certain cases. These proje 
operators can also be defined in terms of matrix operators, i.e., matrix dilation/er 
operators can be obtained for each projection operator. 

Let V-p- = [fi',1^/,2 • • • hj i • • • ^ and £-p i = [YiA Yi.2 ■ • • A./ • • • Yi.nfJ for all i 

Here £,-,■'(/;) = Op and Yij'(fi) — h' for all i, i' e X and i ^ i' and £ij(fi) = fi f 
1 e X. With these operators, we have F v X)-p i — f and F A £p t = /,. Next we defini 
operators f'-p,- : £[ t-> £ and V' Vi : £,■ i-> £ such that £' Vj = [Yi,iY2,i • • • A;,,- • • • 
T> ■p i = ■ ■ ■ A/,/ • • • We see that jfi' A £ -p i = [/1, /2 ,, Im 

fiVV' v . = (Ou 0 2 ,...,fi,...,O m ). 

PROPOSITION 9. 

For the above defined operators £-p, and V>p i from £ i-> £,• and£’-p i and V’ p r from £,• h 
f/ien f/ie pairs (£ P; , DV;) (£'V;, ) are adjunctions. 

3. Translation invariant operators 

In this section, we are concerned with multivariate morphological operators that are t 
lation invariant. We will only discuss spatial translation invariance and will not con 
gray level translation invariance. Towards this end, we will work with a completely It 
ordered commutative (clc) monoid structure for image as used by Hseuh (1992) and S 
& Siddiqi (1995) for the univariate case. A part of the results being presented in this 
tion have already appeared (Wilson 1992) in the context of matrix morphology. How 
these results as presented by Wilson (1992) are only applicable to the binary and ini 
gray-level case and have been proved using concepts from set theory. On the other 1 
the results detailed out here are applicable to any chain structure and have been pr 
using results from the theory of lattice ordered monoids. These results are genera 
versions of matrix morphology and are applicable in a larger gamut of situations. 

We have assumed in the previous section that £ = (£, u, n) is a complete lattice. I 
we additionally assume that £,• = Qf and Qi — Q for all i e X and a binary operati 
is defined on these component sets such that (£,•, *) is a commutative monoid. Furthe 
domain set E is assumed to possess an abelian group structure under the binary operatic 

PROPOSITION 10. 

The direct product (£, ★) of component commutative monoids (£;, ★) isitselfacommut 
monoid under the pointwise binary operation defined for all F,G e £ as 

F * G = [f\ * g\, f2*g2, , fm* gm] 


for all fi , gi e £i with i € X. 


Theorem 2. The direct product structure (C, U, n, ★) is a clc-monoid. 


The least element of C is O which implies L I <p = O. Therefore for all F e C, we have 
F*0 = F *(U<p) = U Getp^F * G) = U <p = O. Every c/c-monoid is residuated and 
the residuation is given for all F, G e C as 

F:G= \J{H eC\G*H <F). (7) 

Thus the structure of a multivariate signal is that of c/c-monoid. The dilation ® and erosion 
© operators on univariate signals have already been defined (Hseuh 1992; Singh & Siddiqi 
1995). We now define multivariate operators in the following. 

DEFINITION 8 

(a) A matrix dilation operator is defined as: 

H = T>g(F) = F El G where hj = \J (/; @ gij) for all j e J. (8) 

i€ I 

(b) A matrix erosion operator is defined as: 

H = £ C (F ) = FQG where hj = /\(// © for all j 6 J. (9) 

161 

(c) Matrix dual operator is defined as G d = [gf,; i el, j € J], where for g, h e C we 
have g < h if and only if h d < and (g d ) d = g. 

(d) Matrix reflected-dual operator is defined as G* = [g* .; i el, j e J], where g*(x) = 
g d (~x). 

(e) Matrix translated operator is defined as G r = [gj .; i el, j e J], where g r (x) = 
g(x - r). 

(f) Transpose of a matrix operator is defined as G r = [gjj; i el, j e J\. 

In the above definitions, Fisa multichannel image and G is a matrix structuring element. 
In the following, we state the duality and translation invariance of matrix operators. 

PROPOSITION 11. 

Let (£, u, n, ®) be a self dual clc-monoid (i.e. f d ®g — (f Og) d ), then 

(a) ( F*@G)* — F\BG 

(b) (F* [T]G)* = F@G 

PROPOSITION 12. 

For any matrix operators, the following holds 

(a) F T SG = (FiG) r = FSG r 

(b) F T mG = (FSG) T = F\nG~ x . 


The increasing property of matrix operators is described in the next proposition. 
PROPOSITION 13. 

(a) For all F,G e £, F < G implies F @H < G@H. 

(b) For all H, H' e M and F e C, H < H' implies F E H < F 0 H'. 




48 


Balvinder Singh and M U Siddiqi 


(c) For all F, G € C, F < G implies F □// < G □ FI. 

(d) For all H,H' € M and F € L, H < H' implies F □ H < F □ H'. 

The next proposition states eight types of distributive laws for matrix operators. 
PROPOSITION 14. 

y(F®G (f) ) 
ter 

ter 

U (F □ G (,) ) 
ter 

n (f □ c w ) 

ter 

y(F (,) sG) 

ter 

n (F (0 SG) 

ter 

□ (F (r) BG) 
ter 

n (F (r) mG) 

ter 

We next state results concerning matrix dilation and matrix erosion operators. 
PROPOSITION 15. 

(a) F@G<H F<HmG T . 

(b) (FQG)BG t 5f5(FaG)HlG T . 

(c) (FiG)iH = FS(GSH). 

(d) (FSG)mH = FB(G0 H). 

(e) FS(GStf) > (F QG) S H. 

(f) F S (G [D H) < (F BG) □ H. 

In the following we discuss matrix opening and matrix closing operators. 
DEFINITION 9 

(a) Matrix opening operator is defined as F o G = (F □ G) B G r . 

(b) Matrix closing operator is defined as F • G = (F B G) □ G r . 

PROPOSITION 16. 

(a) Duality of matrix opening and closing operator: 


(a) Fl| UG W | = 

(b) F B ^ n G<f) ) ^ 

(c) f □ ^ □ gW ) ^ 

(d) FGJ^yG (,) ^ = 

(e) ^ y F (f) ^ B G = 

^ n F(f) ^ h G ^ 

( y F (t A □ g > 

\ ter } 

[ n F(I) W = 

V ter / 


(f) 

( 8 ) 

(h) 



(F* o G)* = F e G and (F* • G)* = F o G. 


(b) Increasing properties: F < G =>• F o H < G o H and F • H < G • H. 

(c) Opening operator is antiextensive F o G < F and closing operator is extensive 
F • G > F. 

(d) Idempotence of matrix opening and closing operators: 

F o G — ( F o G) o G and F • G = ( F • G) • G. 

Next we state some weak distributive laws concerning matrix opening and closing opera¬ 
tors. 


PROPOSITION 17. 

w (,y/ <0 ) 

(b> (n/ w ) 

w (u F i 
(d) {!?/ (,) ) 


o G> [J(F W oG). 
ter 

o g< n(p (<) oG). 
ter 

• G> l_j(F (,) «G). 

ter 

• G < I - ] (F (,) • G). 

/€T 


4. Quasi-lattices and morphological operators 

In earlier sections, we had imposed a partial ordering on the cartesian product set by com 
ponentwise ordering (i.e., marginal ordering) and subsequently obtained morphologica 
matrix operators. Another possibility could have been to apply a linear transformation oi 
the multichannel image followed by the application of multivariate morphological opera 
tors as discussed in §2. Such an approach has been investigated by Eo (1992) and Goutisia: 
et al (1994). This approach also utilizes the marginal ordering principle. However it shouk 
be realized that with the help of marginal ordering, not all elements of the cartesian prod 
uct set can be compared. In order to overcome this limitation, one may define a mapping 
procedure such that all elements of the mapped set can somehow be compared. This i; 
possible if some kind of reduced ordering is employed, since it imposes a total orderinj 
on the set of multivariate data. This poses before us the following question: What shouli 
be the mapped set and what are the kind of mappings that must be considered? Since th< 
elements of the mapped set should have a total ordering over its elements, one can choosi 
the set R. Let us assume that there exists a surjective mapping Q : C R. The actua 
mappings that are of interest will be considered later in this section. To begin with we givi 
the following definitions. 

DEFINITION 10 

(a) The binary order relation ■< is defined for all F, G e £ as F < G if and only i 
Q(F) < Q(G). 

(b) The relation ~ is defined for all F, G e £ as F ~ G if and only if F < G and G < F 
that is Q(F) = Q(G). 
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PROPOSITION 18. 

The structure (£, <) is a quasi-ordered set . 

PROPOSITION 19. 

In a quasi-ordered set (£, <), we have: 

(a) The relation 22 is an equivalence relation on £. 

(b) IfC\ and C 2 are two equivalence classes corresponding to the relation then F\ < F 2 

either for no F\ e C\ and F 2 £ C 2 or for all F\ e C\ and F 2 e C 2 . 

(c) The quotient set S = C/~ is a poset where C\ < C 2 is defined to mean that F < G 

for all F e C\ and G € £ 2 - 

The equivalence class is defined as C(f) = {F e C\Q(F) = f e U}. There exists 
mappings Q~ l : R C such that QQ _1 (/) = / for all / 6 U. Here QQT X is the 
identity operator A. However it should be noted that Q~ l Q is not necessarily the identity 
operator A. However QQ~ l Q = Q. Hence the operator Q~ x is termed as the pseudo¬ 
inverse of Q. Thus a quasi-ordering relation is imposed on the set C with the help of a 
surjective map Q and the order on R. In a similar manner, one can define supremum u and 
infimum n of any subset of £. Hence we have 

Q( LJ(F (f) )) = V Q(F (?) ) = Q(F (i) ) 

\ ter / ter 

=> y(F (f) )~F (i) . (10) 

ter 

Similarly, we also have 

Q ( PI ( f(0 H = A 2(^ (0 ) = Q(F (S) ) 

=» p|(F (f) )~F (s) . (11) 

ter 

Here we refer to (£, u, n) as a complete quasi-lattice since £ is a quasi-ordered set. 

We are now in a position to obtain morphological operators over complete quasi-lattices. 
However it should be realized at this point that these operators are essentially over equiv¬ 
alence classes of C. This is because any operator has to take all elements from one equiv¬ 
alence class to elements of some other equivalence class. The quasi-ordering relation ■< 
on £ induces a quasi ordering on the operators over the set £ as ^ if and only if 
0(F) < O(F) for all F € £. It is evident that the set of operators mapping £ to £ is 
also a quasi-ordered set. An equivalence relation can also be defined on operators. We say 
that O ~ O if and only if Q(4>) = 2(0). This equivalence relation partitions the set of 
quasi-increasing operators into equivalence classes. 

DEFINITION 11 

(a) An operator O : £ t-> £ is a quasi-increasing operator if for all F, G 6 £, F ^ G 
implies O(F) < 0(G). 

(b) A pair of operators (£, V) is called a quasi-adjunction if for all F, G e £ we have: 


V(F) < G 


F < £(G). 
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The set of quasi-increasing operators is closed under operator composition. Further if the 
pair of operators (£, V) is a quasi-adjunction, then it is straightforward to verify that £ 
and V are quasi-increasing operators. It should be noted that Q _l Q ~ A and is a quasi- 
increasing operator. Thus we have the following proposition. 

PROPOSITION 20. 

Let ^ be a quasi-increasing operator on C, then we have 
'VQ~ l Q - - Q~ X Q 'P 

PROPOSITION 21. 

An operator 'P : C C is quasi-increasing if and only if there exists an operator 
\js : R f* (R such that j/Q = Q'i 1 . Further the operator js can be uniquely determined 
from *P as — Q^P Q ~ 1 , where Q ~ 1 is any arbitrary pseudo inverse of Q. 

The readers interested in the proof of the above proposition can refer to Goutsias et al 
(1994). We observe that there are many operators »P which will map to the same operator 
f. All these operators are said to belong to an equivalence class of operators defined as 
C(fr) — {'Plf = Q'PQ -1 }. We can now state the following proposition. 

PROPOSITION 22. 

For any *P € C(f ) and <t> = 1 Q'P for any arbitrary pseudo-inverse Q~ l , then <f> € 

cm 

Hence one can obtain all the elements of C(f) by composing any element *fi e C(f) with 
Q~ l Q for all possible pseudo-inverse operators Q~ 1 . 

PROPOSITION 23. 

The pair of operators (£, V) is a quasi-adjunction on C for all £ € C(s) and V € C(S) if 
and only if the pair ( e , 5) is an adjunction on IR. 

Since e and 8 are erosion and dilation operators on IR, therefore we term £ and V as 
quasi-erosion and quasi-dilation operators respectively. In order to illustrate how the results 
for erosion and dilation operators over complete lattices can be modified to corresponding 
results related to quasi-erosion and quasi-dilation operators over complete quasi-lattices, 
we state the following proposition. 

PROPOSITION 24. 

Let (£, V) be a quasi-adjunction between C and C then we have the following: 

(a) V(U teT F (t) ) - U teT V(F^)for{F^ e £; t e T}. 

(b) £( n teT F<'>) ~ n rsT £(F {l} )for {F« e L; t e T). 

(c) £(I) ~ I and T>(0) ~ O. 

(d) £T> >: A and T>£ < A. 

(e) £T>£ ~ £ and V£V ~ V. 

Other results for morphological operators over complete lattices can also be suitably mod¬ 
ified to yield morphological operators over complete quasi-lattices. We next consider the 
question: What are the suitable types of mappings? To arrive at a reasonably general answer 
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Suppose that we have a cartesian product set £ with a binary relation <. Here £ = 
11 i6i A- We are in search of an ordinal utility function Q : £ h* R. For this, we assume 
that there exist functions qi : £,• IR for all i e 1 and a function q : 11/ex 9/(A) ^ R, 

i.e., : R m t-> IR. This function q is called a composition rule. Let F e £, then 


Q0F) = Q(/i,/ 2 ,...,/m) 

= ?0?l(/l).<?2(/2), • • • > qm(fm))- 


( 12 ) 


The composition rules that have been of most interest in the measurement literature are 
those where q is a polynomial. Let a e IR m , then 


q(a) = q(a\,a 2 , ...,a m 



(13) 


where at are real numbers and are non negative integers. The specific cases of this 
function lead us to Euclidean and generalized distance measures used in reduced order¬ 
ing (Barnett 1976) and valuations used in quasi-metric lattices (Birkhoff 1973). 

Lastly, we illustrate the process of morphological filtering on a colour image. Figure 1 
shows a 512 x 480 colour image of Lenna. Each pixel of the image has three components 
corresponding to the red, green and blue colour channels. This image corrupted with the 
Max noise is shown in figure 2. In the Max noise model, each pixel is corrupted with 
noise with a probability p (here we have taken p = 0.5), where the corrupting noise for 
each channel is an independent realization of a sequence of uniformly distributed random 
variables lying between (0,255). The noise corrupts the signal by taking the maximum 
of the image and the noise sequence. Figure 3 shows the result of applying a 3 x 3 flat 
structuring element for the quasi-opening operator utilizing the Mahalanobis distance. It 
can be seen that even with such high probability of noise occurrence, the morphological 
filter has been able to significantly remove the corrupting noise. Similarly, for the Min 
noise model, a quasi-closing operator can be used to filter the multichannel image. 


5. Conclusions 

The objective of this paper is to provide a lattice theoretical framework for multivariate 
morphology. Within this framework, marginal ordering and reduced ordering schemes have 
been used to develop techniques for morphological processing of multichannel images. It 
has been shown that marginal ordering principle essentially leads to the matrix morphology 
approach. Further, it has been shown that concepts of quasi-ordering need to be utilized 
to develop a theory of multivariate morphology based on the reduced ordering scheme. 
A new concept of complete quasi-lattice is introduced which results in quasi-increasing 
operators, quasi-erosion/dilation operators, quasi-adjunctions etc. It remains to be seen 
that what kind of operators are suitable for a particular application at hand. This needs a 
detailed investigation to be carried out to bring out the advantages and disadvantages of 
various schemes for specific multichannel image processing and analysis applications. 
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Lossless and lossy image compression using Boolean 

function minimization 
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Indian Institute of Science, Bangalore 560 012, India 

Abstract. A novel approach for lossless as well as lossy compression of 
monochrome images using Boolean minimization is proposed. The image is 
split into bit planes. Each bit plane is divided into windows or blocks of variable 
size. Each block is transformed into a Boolean switching function in cubical 
form, treating the pixel values as output of the function. Compression is per¬ 
formed by minimizing these switching functions using ESPRESSO, a cube 
based two level function minimizer. The minimized cubes are encoded using a 
code set which satisfies the prefix property. Our technique of lossless compres¬ 
sion involves linear prediction as a preprocessing step and has compression 
ratio comparable to that of JPEG lossless compression technique. Our lossy 
compression technique involves reducing the number of bit planes as a prepro¬ 
cessing step which incurs minimal loss in the information of the image. The bit 
planes that remain after preprocessing are compressed using our lossless com¬ 
pression technique based on Boolean minimization. Qualitatively one cannot 
visually distinguish between the original image and the lossy image and the 
value of mean square error is kept low. For mean square error value close to that 
of JPEG lossy compression technique, our method gives better compression ra¬ 
tio. The compression scheme is relatively slower while the decompression time 
is comparable to that of JPEG. 

Keywords. Switching theory; Boolean minimization; image compression. 


1. Introduction 

Conventional lossless image compression techniques employ a decorrelation technique 
such as DPCM or Hierarchical Interpolation followed by a coding scheme such as Huffman 
or arithmetic coding (Roos et al 1988). In this paper we propose a radically different 
approach to coding based on finding the minimal Boolean function representation for sub¬ 
blocks of an image bit plane. The emphasis in this paper is on the coding scheme and 
further investigation is needed to determine the decorrelation method that best suits our 
coding scheme. 

In lossless as well as lossy image compression, the state of the art is the JPEG standard 
(Pennebaker & Mitchell 1993). For lossless compression, JPEG employs a decorrelation 



scheme based on DPCM (several predictors are provided to the user) followed by adaptive 
Huffman or arithmetic coding. The lossy JPEG processes are based on the Discrete Cosine 
Transform (DCT) and entropy coding of the quantized DCT coefficients based on adaptive 
Huffman or arithmetic coding. Our techniques for lossless and lossy compression are based 
on the minimization of Boolean switching functions. 

In lossless compression an error file is obtained by applying a simple linear prediction 
on the original image file. The error file is now split into bit planes. Each bit plane is a 
binary image which is divided into variable sized blocks using the quad tree approach 
and each block is converted into a Boolean switching function in cubical form. The func¬ 
tions are then minimized using the well known cube-based two-level logic minimizer 
ESPRESSO (Brayton et al 1984). The minimized cubes which represent the implicants 
(product terms) of the function are then coded with a code set which satisfies the prefix 
property, to obtain the compressed data. The results are compared with JPEG lossless mode 
of compression. 

In lossy compression we reduce the number of gray levels in the original image to less 
than or equal to 32 by applying the Centre of Mass technique . The image containing at 
most 32 gray levels will be recoded by mapping the gray values to a 5 bit gray code, thus 
reducing the number of bit planes from 8 to 5. The mapping function leads to a fixed 
overhead of only 96 bits. These 5 bit planes will be processed in the same manner as 
we do for the lossless compression scheme. The results are compared with JPEG lossy 
compression technique. 

2. Background 

f 

The basic idea of employing Boolean minimization for image compression has been re¬ 
ported in an earlier work (Augustine et al 1995). However, in that paper, only lossless 
compression of bi-level (black and white) images was considered. In the present work, 
we have extended the approach to both lossless as well as lossy compression of gray level 
images and our technique can be easily applied to full colour images as well. We also 
consider the division of each bit plane into variable sized windows or blocks using a quad 
tree structure, leading to higher compression efficiency, as opposed to uniform division 
into fixed sized blocks (Augustine et al 1995). For the sake of clarity of presentation, some 
basic definitions are reproduced below from Augustine et al (1995). 

DEFINITION 1 

A Boolean switching function F is a mapping F : B iV —> B, where B = (0, 1}. 
DEFINITION 2 

In the truth table of a switching function of N variables, there are 2 N rows. Each of these 
rows which represents an input state vector is called a minterm. 

DEFINITION 3 

In a switching function, the ON-set is the set of minterms whose outputs are mapped to 1 
and the OFF-set is the set of minterms whose outputs are mapped to 0. 

DEFINITION 4 

A cube is an N -tuple A = {ai, ai ,..., an], where a ,• e (0, 1, X}. The dimension of the 
cube is the number of Xs in it. An a cube has 2" minterms (zero cubes) within it (Biswas 
1993; Breuer 1972). 
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DEFINITION 5 

The performance of a lossless data compression algorithm is assessed by the parameters, 
compression ratio and run time. We define compression ratio as, 


(no. of input bytes — no. of output bytes) 
no. of input bytes 


x 100%. 


3 . The Lossless compression scheme 


Block diagram of our lossless data compression scheme is given in figure 1. The com¬ 
pression scheme consists of the steps of linear prediction, function generation, function 
minimization and cube encoding. 


3.1 Linear prediction 

First we apply linear prediction on the original image file. A simple predictor of averaging 
two adjacent pixel values A and B as shown in figure 2 is used. The error values obtained 
are recoded by adding the magnitude of the maximum negative error value to each error 
to obtain only positive values in the error file. 


3.2 Function generation 

The values in the error file are replaced by their equivalent gray code, so that adjacent 
error values are advantageously mapped onto logically adjacent codes leading to fewer 
transitions between Os and Is on the bit planes. The error file is now split into bit planes. 
The number of bit planes can be 8 or 9 depending on the number of error values obtained. 
If the number of error values are more than 256, then there will be 9 bit planes. The 
bit planes will be divided into variable window sizes using the Quad tree approach. A 
switching function in cubical form for each window is generated by assigning the pixels 
to minterms according to Gray code. Gray code is chosen because of its unit distance 
property, to capture the correlation likely to be present among adjacent pixels, by assigning 
geometrically adjacent pixels of a window to logically adjacent minterms. This assignment 
helps in minimization since any 2 a logically adjacent minterms combine to form a single 
a-cube. Pixels are scanned row wise with a reversal of the direction for adjacent rows, to 



original image 


error file 

Boolean 
functions in 
cubical form 

minimized 

cubes 

compressed 

image 


Figure 1 . Lossless image 
compression scheme. 
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X = (A + B) / 2 


Figure 2. Sample predictor pixels. 


ensure that pixels at the ends of consecutive lines are mapped to logically adjacent codes. A 
more detailed explanation on function generation can be found in Augustine etal (1995). 


3.3 Quad tree approach to bit plane segmentation 

Each bit plane is divided into sub-planes (windows or blocks) of variable sizes using the 
quad tree approach. This is done because of the fact that the region of correlation on the 
bit planes can be large or small. Capturing such variable size regions of high correlation 
on the bit planes results in better compression. 

In the quad tree approach of dividing the bit plane into variable sized windows, a bit 
plane is treated as a collection of leaf nodes. Given a 2” x 2” array of pixels, a quad 
tree is constructed by repeatedly subdividing the array into quadrants, subquadrants etc., 
until we reach the smallest window size that we have decided on. We have employed a 
maximum window size of 32 x 32 pixels and a minimum of 4 x 4. To decide whether a 
given block has to be further subdivided or not, we have used a simple heuristic criterion. 
If a given block consists of 98% or more black or white pixels, we retain this block without 
further division; otherwise it is divided into four equal sub-blocks and the same criterion 
is applied to each sub-block to decide whether to divide them further. Once the sub-block 
size reaches 4x4 pixels, we retain it without further division, even when our criterion is 
not satisfied. 
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Figure 3. A bit plane and its Quad tree 
representation, (a) A bit plane of size 
32 x 32, (b) Quad tree representation 
of the above bit map. 
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The process of subdividing the bit plane into quadrants, subquadrants etc., can be rep¬ 
resented by a tree of out degree 4 in which the root node corresponds to the entire bit 
plane, the four sons of the root node correspond to the quadrants, and the terminal or leaf 
nodes correspond to those windows of the bit plane for which no further subdivision is 
necessary (Samet 1985). The nodes at level k (if any) represent blocks of size 2 k x 2 k and 
are referred to as nodes of size 2 k . The tree is created in a depth first manner and hence is 
coded in preorder form in the compressed file. 

An example of a bit plane being divided using quad tree approach is shown in figure 3. 
The bit plane is of size 32 x 32. The quad tree representation of the bit plane has been 
shown in figure 3b, where a small box indicates a terminal or a leaf node. The preorder 
listing of the quad tree is 

ABD123456789C 10 11 12 13. 

The preorder listing of the quad tree indicates the manner in which the bit plane was 
divided. The alphabets A, B, C etc., indicate the nodes which are not terminal nodes. The 
terminal nodes 8 and 9 are of size 16 x 16, nodes 5, 6, 7, 10, 11, 12 and 13 are of size 8x8 
and the nodes 1,2,3 and 4 are of size 4x4. The quad tree structure can be effectively 
represented using only 1 bit for each node (Samet 1985). 

3.4 Function minimization 

Boolean function minimization is performed on the function generated for each block 
using the two-level cube-based logic minimizer ESPRESSO (Brayton et al 1984) to find 
the equivalent minimized cubical representation. For a particular function, in general, the 
number of cubes in its minimized ON-set and OFF-set are different. Better compression 
can be achieved by choosing the set with lesser number of cubes, since both represent the 
same function. The information regarding the choice of the ON/OFF set is passed on to 
the decoder via the header for each block, and is discussed in § 4. 

3.5 Cube encoding 

In cube encoding the set of minimized cubes of the function corresponding to each window 
is coded separately. A code set 0, 10, 11 which satisfies the prefix property is used for the 
cube symbols 0, 1 and X by allotting the one bit code to the symbol with maximum 
frequency of occurrence. One or two bits are needed to encode the information about the 
allotment of prefix code to the cube symbols. If minimization fails to achieve compression 
for any window, we choose to represent the original window as such in the compressed 
image and this is indicated in the window header. 


4. Format of the compressed image 

The compressed file format has a global header containing the information of the size 
of the image and the information required for reverse prediction. For each bit plane the 
windows or blocks will be encoded in a specific format given in figure 4. Since we have 
considered only window sizes of 32 x 32, 16 x 16, 8 x 8 and 4x4, the quad tree bits 
associated with any terminal node will have 1, 2 or 3 bits depending on the level of the 
node in the quad tree structure. The one bit status code for the window is interpreted as 
below. 
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Figure 4. Window format. 


• 0 : Window without minimization. Minimization algorithm has failed to produce 
compression for this window, and bits are stored as in the original window. 

• 1 : Window is minimized and encoded as cubes. 


In the latter case the next two bits are used to indicate further characteristics of the 
windows. These two bits denoting the encoding scheme have the following interpretation. 

• 00 : ON set is minimized and stored as cubes. 

• 01 : OFF set is minimized and stored as cubes. 

• 10 : window consists of all black pixels. 

• 11: window consists of all white pixels. 

For the last two cases, we need no further bits to encode the window. However, in the first 
two cases the next one or two bits as shown below, are used to indicate the allotment of 
prefix codes for the cube alphabets. 

• 0:0 -»• 0, 10 -* 1, 11 —► X. 

• 10:0 -* 1, 10 -*X, 11 0. 

• 11 :0 ->X, 10 -* 0, 11 -* 1. 

After the prefix code allotment bits, the next m bits (m varies between 0 and 6 depending 
on the window size) are used to indicate the number of cubes. Encoded cubes are placed 
after this (figure 4). 

It is possible that although many windows in an image bit plane can be compressed by 
logic minimization or by virtue of the fact that they are all black or all white windows, 
several other windows may not yield any compression, leading to a net expansion for an 
entire bit plane. This usually happens for two or three least significant bit planes of most 
images, which have more random characteristics compared to the higher bit planes. In 
such a case, the entire bit plane is stored as it is in the compressed file. A one bit global 
header is associated with each bit plane to indicate whether the entire bit plane is in the 
original or compressed form. 


5. Decompression scheme 

Decompression procedure consists of three steps, namely, 

1. Window recovery 

2. Bit plane recovery 

3. Image recovery 





Table 1. Results of the lossless compression experiment. 


Test 

Image 

Logic coding 

JPEG Lossless mode 

comp. 

time* 

decomp. 

time* 

compr. 
ratio % 

comp. 

time* 

decomp. 

time* 

compr. 
ratio % 

baboon 

195 

2.8 

10.7 

0.4 

0.3 

13.7 

boats 

132 

2.6 

31.5 

0.4 

0.3 

32.9 

girl 

125 

2.6 

37.0 

0.4 

0.3 

38.1 

average 

151 

2.7 

26.4 

0.4 

0.3 

28.2 


* CPUs on IBM RS-6000/580 


Window recovery consists of locating a block of data in the compressed file corresponding 
to a bit plane window and reconstructing the original bits of this window. If the window has 
not been compressed, recovery is straight forward; else the two bits for encoding scheme 
will indicate whether the ON/OFF set has been minimized and stored as cubes or whether 
the window is all black/white. In case of minimized ON/OFF set representation, the cubes 
of the minimized function are recovered from the knowledge of the number of cubes and 
the allotment of prefix code to the cube alphabets. Once the minimized ON/OFF set of 
the function is obtained in the cubical form, the bits of the original window can easily be 
obtained by expanding the function to its truth table form. 

Bit plane recovery consists of reconstructing an entire bit plane from the recovered 
windows of this bit plane, using information from the quad tree representation for the 
different windows. 

Image recovery consists of combining the individual bit planes of the image. Since linear 
prediction was employed as a preprocessing step, image recovery first yields the error file 
from which the original gray values are obtained by reverse prediction. 


6. Experimental results 

The proposed compression and decompression schemes, tentatively called as Logic coding, 
have been implemented in C on an IBM RS-6000/580 workstation with UNIX operating 
system, and tested on gray level images of size 256 x 256 pixels (64k bytes). Comparison of 
the performance of our scheme with that of JPEG lossless mode is given for 3 standard gray 
scale images baboon, boats and girl in table 1. We have used the PVRG-JPEG (Portable 
Video Research Group) Codec 1.1 available through internet from havefun.stanford.edu, 
to carry out the experiments. Our lossless compression scheme is comparable to JPEG 
lossless mode in compression ratio. The predictor used by JPEG is the same as the one 
used in our scheme shown in figure 2. The CPU time for encoding for JPEG is better 
than that of our compression scheme, but the decompression time is comparable to that of 
JPEG. 


7. Lossy compression 

Our lossy compression scheme consists of the following four steps. 

• Reducing the number of bit planes from eight to five 

• Function generation 



• Function minimization 

• Cube encoding 

7.1 Reducing the number of bit planes 

To reduce the number of bit planes we first reduce the number of gray levels in an image. 
The number of gray levels are reduced to 32 or less. To achieve this reduction, we use the 
technique called Centre Of Mass technique. We use the fact that the human eye cannot 
detect very small changes in gray values and have substituted a single gray value for a 
group of adjacent gray level values, as explained below. 

We first divide the range 0 to 255 of gray values into 32 intervals. Each interval will 

consist of 8 gray values. The intervals are 0 to 7, 8 to 15, 15 to 23,_, 248 to 255. 

The frequency count of each gray value in the original image is computed. Then for each 
interval k , ranging from 0 to 31, we calculate a gray value gl ne w (representing the centre 
of mass of that interval) using the formula 

8A+7 / 8A'+7 

glnew = ^2 L f r ‘ / J2 f r ‘ 
izzSk ' i = 8 k 

where / corresponds to the gray level value (between 0 and 255) and /r,- is the frequency 
of occurrence of / th gray level in the image. Thus we get a gray value glnew for each 
interval which is substituted for all the 8 gray levels falling within that interval. If the value 
obtained from the above calculation is a fraction, it will be rounded off to nearest integer 
value. Thus for each interval we find a gray value glnew which is biased towards those gray 
levels whose frequency of occurrence is more and thus help to reduce the mean square 
error value. One major advantage is that the error is bounded and is not localized but is 
distributed throughout the image. 

The recoded image will have at the most 32 gray values and each of these values denote 
the centre of mass of the interval in which the gray value falls. The five MSBs of each 
recoded gray value represent the interval while the three LSBs denote the offset from the 
start of the interval to its centre of mass. We can maintain these offsets in an array of 
32 x 3 bits and recode the intervals by a 5 bit gray code. Thus the entire recoded image 
now consists of 5 bit planes and the 32 x 3 bit overhead for denoting the offsets is included 
in the global header. 

This is the only step where we incur loss in the image. Qualitatively one cannot visually 
distinguish between the original image and the recoded image having only 32 or fewer 
gray levels. 

The five bit planes obtained from the Centre Of Mass technique will now undergo steps 
of function generation, function minimization and cube encoding in the same manner as 
explained in the lossless compression scheme. 

7.2 Compressed file format 

The compressed image format is same as that of the lossless compression scheme with 
differences in the global header. There will be a header of 12 bytes containing the offset 
information with respect to the centre of mass of each interval. There will be only .five 
encoded bit planes in the compressed file. The window format is same as shown in figure 4. 





Table 2. Results of the lossy compression experiment. 


Test Logic coding JPEG lossy mode 


Image 

MSE 

comp. 

time* 

compr. 
ratio % 

Q-factor 

MSE 

comp, 
time * 

compr. 
ratio % 

baboon 

5.45 

150 

47.85 

95 

4.16 

0.5 

33.37 





94 

5.68 

0.5 

38.07 

boats 

5.47 

94 

65.85 

93 

5.10 

0.4 

62.39 





92 

6.12 

0.4 

64.98 

girl 

5.40 

85 

67.87 

93 

5.14 

0.4 

66.88 





92 

5.94 

0.4 

69.60 

average 

5.44 

110 

60.52 


5.36 

0.43 

55.88 


* CPUs on IBM RS-6000/580 


7.3 Experimental results 

The results of our scheme and that of JPEG are given in table 2. We have kept mean 
square error (MSE) as a reference to compare the compression ratios. To get MSE of 
JPEG decompressed file close to that of ours we have tried different values for the quality 
factor (Q-factor) which is an option available in JPEG. As the MSE of JPEG decompressed 
image was not exactly identical to ours, we have given results for two Q-factors giving 
results as close to our MSE as possible. From the results it is clear that for an average 
MSE close to that of JPEG, our compression ratio is better by 4.64% on an average, but 
JPEG has a plus point in the compression time required. The decompressed images were 
visually indistinguishable from the originals both for our scheme as well as JPEG. 


8. Conclusions 

A novel approach for image data compression using the Boolean function minimization 
technique has been presented. The images after the preprocessing step are split into bit 
planes. Each bit plane is split into variable sized blocks and these blocks are converted into 
a set of Boolean switching functions and minimized using ESPRESSO to get the minimal 
representation in cubical form. A prefix code set is used for the encoding of cube alphabets 
to achieve maximum possible compression. 

In lossless compression of images our experiments show that our technique gives com¬ 
pression ratio comparable to that of JPEG lossless mode. In lossy compression technique, 
for an average mean square error close to that of JPEG, we are better in compression ratio 
by 4.64%. JPEG is better in the execution time required as compared to our technique. As 
ESPRESSO is used as a stand alone package, considerable time is wasted in the commu¬ 
nication and the file manipulation overheads. Time performance is expected to improve 
once ESPRESSO is integrated into the code. 

Two level minimized representation of a function is not the most compact representation 
for all functions. There are many functions for which the minimum two level representation 
leads to an expansion, for example the parity function. For such functions, alternative 
representation such as Binary Decision Diagrams (BDDs) may be employed (Akers 1978). 
Further work is also required to study the performance of our lossy compression approach 
at higher MSE levels and compare it with JPEGs performance. 
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Abstract. M channel maximally decimated filter banks have been used in the 
past to decompose signals into subbands. The theory of perfect-reconstruction 
filter banks has also been studied extensively. A class of filter banks that has 
received particular attention are the so-called paraunitary filter banks. These 
filter banks are attractive because there exist robust structures for implement¬ 
ing these systems, based on factorizations of their polyphase matrices. In this 
paper, we will impose two other conditions on the paraunitary filter bank, the 
linear phase property of the filters, and their pairwise mirror-image symmetry in 
the frequency domain. These two properties are useful in several applications, 
including image processing. In this paper we will propose a factorization for 
filter banks having all the three properties simultaneously, namely, paraunitari- 
ness, linear phase, and pairwise mirror-image symmetry. We will then prove 
that our factorization is minimal and complete. This means that the number of 
delays used is equal to the McMillan degree, and that all filter banks satisfying 
the three properties simultaneously can be factored in terms of our factoriza¬ 
tion. Our factorization therefore gives us a robust structure for implementing 
these systems. All the three properties are guaranteed to be satisfied inspite of 
multiplier quantization. 

We are not aware of any other factorizations that are proved minimal and 
complete for any three properties simultaneously. 

Keywords. Paraunitary filter banks; linear phase; factorizations. 


1. Introduction 

Digital filter banks have been used in the past to decompose a signal into frequency sub¬ 
bands (Crochiere & Rabiner 1983; Mintzer 1985; Smith & Barnwell 1984; Vaidyanathan 
1990; Vaidyanathan 1993; Vetterli 1986). The signals in different subbands are then coded 
and transmitted. Such schemes are popular for encoding data from speech, high quality 
audio, and image signals. The process of decomposing the signal and eventually recon¬ 
structing it is done by the ‘analysis-synthesis’ filter bank system shown in figure 5.4.1 
of Vaidyanathan (1993). In this scheme, the Hi(z) are the analysis filters and F, (z) are the 
synthesis filters. The boxes with i M denote the decimators, or the subsampling devices, 
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whereas the boxes with f M denote the expanders, which increase the sampling rate. Their 
definitions are as in Crochiere & Rabiner (1983) and Vaidyanathan (1990). 

Figure 5.5.3 of Vaidyanathan (1993) is a representation of the subband coding scheme 
in terms of the polyphase matrices. E(z) is the polyphase matrix corresponding to the 
analysis filters, and R(z) is the polyphase matrix corresponding to the synthesis filters. 
The decimators and expanders have been moved across the polyphase matrices using 
the noble identities (Vaidyanathan 1990). It has been shown that it is indeed possible to 
perfectly reconstruct the original signal using such analysis-synthesis systems. One way to 
ensure perfect reconstruction is to let R(z) = E -1 (z), and such a general scheme is called 
a biorthonormal system. Another approach to design a perfect reconstruction system with 
finite impulse response filters (FIR) is to choose the matrix E(z) to be a FIR ‘paraunitary’ 
matrix. A matrix is said to be paraunitary (Vaidyanathan 1993) if it satisfies the equation 

E(z)E(z) =1, (1) 

where E(z) = E f (l/z*). The system can be guaranteed to have the perfect reconstruction 
property by having R(z) = E(z). 

Consider the synthesis bank of figure 1 of Vaidyanathan (1993). The original signal can 
be written in terms of the subband signals as 

M—\ 

x (n) = J2 ^2yk(m)fk(n - Mm). (2) 

* = 0 m 

This can be viewed as a representation of the original signal in terms of a doubly indexed 
set of basis functions % m (n) = /*(« — Mm). It is known (Rioul & Vetterli 1991; Soman 
& Vaidyanathan 1992) that this set of basis functions is orthonormal if and only if the 
polyphase matrix R(z) corresponding to these filters is paraunitary. 

Another feature of the paraunitary analysis-synthesis system is that the analysis and 
synthesis filters are simply time-reversed conjugate versions of each other and, in particular 
therefore, they are of the same length. In a practical subband coding system, both the filter 
coefficients, as well as the subband signals are quantized. It has been shown (Vaidyanathan 
1993) that there exist structures which retain the paraunitary property inspite of coefficient 
quantization. The perfect-reconstruction property is however lost, when the signals in 
each subband are quantized. A paraunitary system still has some important features in the 
presence of subband quantization, which are listed in Soman & Vaidyanathan (1993). 

In several applications, and particularly in image coding, it is desirable to have each filter 
in the system as a linear phase filter. This would not be necessary if there were no subband 
quantization, which is not a case of practical interest. The problem of designing linear phase 
perfect reconstruction systems has been considered by other authors in the past (Princen 
& Bradley 1986; Vetterli & Le Gall 1989). Nguyen & Vaidyanathan (1988) have studied 
the condition that the analysis (and synthesis) filters satisfy the pairwise mirror-image 
symmetry constraint in the frequency domain around nr/2. The advantage of the resulting 
structure is that it requires fewer parameters. The design time is correspondingly lower 
than other structures. The filter responses obtained using this structure are also better and 
have been widely used. 

A natural question which arises is whether all these three properties can be simultane¬ 
ously imposed on a filter bank, i.e., is it possible to have paraunitary filter banks whose 
filters have linear phase, and also satisfy the pairwise mirror-image condition in the fre¬ 
quency domain? Such a filter bank, if it exists, would have all the desirable properties 
listed above. In this paper, we will show the existence of such filter banks. The method 





we use is that of factorizing the polyphase matrices. Our factorizations are complete. This 
means that all filter banks having these properties can be factorized in terms of our struc¬ 
ture. Hence, the implementations based on our factorization are robust to quantization. 
The three desirable properties continue to hold even if we quantize the multipliers in our 
structure. 

Another point worth mentioning is the minimality of our structure. A structure is said 
to be minimal if it uses the minimum number of delay elements necessary (Vaidyanathan 
1993). We will also show that our structure is minimal. 


1.1 Notations 


Bold-faced quantities denote matrices and vectors, as in A and x. A r , A -1 and 7>(A) 
denote the transpose, the inverse, and the trace of the matrix A respectively. A subscript on 
a matrix indicates its size, when the size is not clear from the context. Reserved symbols for 
special matrices are as follows: I is the identity matrix. The matrix J/y is the anti-diagonal 
matrix of size N x N. For example, the anti-diagonal matrix of size 4 is 


" 0 0 0 1 “ 
0 0 10 
0 10 0 
10 0 0 


( 3 ) 


0 will denote the null matrix, whose size will be clear from the context. Ym will denote a 
special diagonal matrix of size MxM, with alternating ±l's on the diagonal, starting with 

+1. Hence if M/2 is even, we can write ^ 


= ( v 7 2 


v 


0 

-Y M/2 


V 0 Ym/2 


)■ 


whereas if M/2 is odd, 


0 

-Ym/2 


^ . The matrix Q is by definition, Q = ^ ^ 

for even M/2, A superscript asterisk as in /* (n) denotes conjugation. Consider a transfer 
function A(z). It can be written in terms of its M polyphase components (Vaidyanathan 
1990) as follows: 

A(z) = a 0 (z M ) + Z- l a x (z M ) + ... + z- (M - l) a M - 1 ( Z M ). (4) 


This is known as Type I polyphase. Let Hi(z), i = 0,..., M — 1, be a set of analysis 
filters. They can be written as 
M—1 

Hk(z) = J2 z~ l Ekl(z M ) k = 0,..., M - 1. 

/ = o 

The matrix E(z) = [£*,/(£)] is called the polyphase matrix of the analysis filters. A set of 
filters Hk(z) whose polyphase matrix is paraunitary are said to form a paraunitary system 
(1). Throughout this paper, we will deal with real, causal, FIR systems. Given such a 
system E(z) of order N, we can write it explicitly as 

E(z) = e(0) -I- e(l)z _1 + e(2)z -2 + ... + e(N)z~ N , e(W) # 0. (5) 

In this case, the analysis filters typically have order M(N + 1) — 1. 


2. Linear phase paraunitary filters with pairwise mirror-image 
frequency responses 

We will now propose our factorization for linear-phase paraunitary matrices with pairwise 
mirror-image symmetry. We will deal with the case of M being even, which is the case 
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almost always used in practice. In order to obtain factorizations of such systems, we f 
need to obtain a characterization of their polyphase matrix which reflects these proper 
of the individual filters. The polyphase matrix of a paraunitary system can be characteri 
as in (1). Now, consider a set of M paraunitary transfer functions whose polyphase ma 
E (z) satisfies the property (Vetterli & Le Gall 1989) 

Dz- yv E(z-')J w = E(z), 

where N is the order of the paraunitary matrix E(z). Such a polyphase matrix correspo 
to a set of filters which have linear phase. The matrix D is a diagonal matrix whose ent: 
are ±l's, the +1 's in those rows which correspond to symmetric filters and — l's in th 
that correspond to antisymmetric filters. The filters described by this equation have 
same centre of symmetry ((N + 1 )M — l)/2. 

The condition that the filters in the filter-bank satisfy the pairwise mirror-image symi 
try often leads to better filter designs, and faster convergence of the optimization proc 
An example of a structure satisfying these properties was given by Soman et al (19! 
Consider figure 3 of that paper. The filters at the mth stage are denoted as H m j (?). If tl 
filters satisfy pairwise mirror-image symmetry, they can be related to each other as 

= ifc = 0. M-l. 

The order of each filter is (m + \)M — L It can be verified that in this case, the polypi 
matrix E m ( z ) of the filters satisfies the matrix equation 

z YE m (z *)V mJm = E m (z), 

where 



Under the linear phase condition mentioned earlier, the equation can be further simpli 
to (assuming M is even) 

JmE(z) = E(z)V m . 

Theorem. A linear phase paraunitary matrix satisfying (6) whose filters satisfy the 
ditional pairwise mirror-image property in the frequency domain , (10), can alway 
realized as 


E(z) = SA(z)PT 0 A(z)... TatP, 

where 

S = (1 " 2) (^ CJG -CK 

T. _ ( Ia//2 \ ( V W/2 U,-V W/2 0 \ / Ijtf/2 Im/2 \ 

' V Iw/2 -Im/2 A 0 U, A Imp. -Im/2 ) 


and 


AW =('» /2 AJ - 



rat ioi ituuuri uj paraimuary juier oariKS 


p = ( l M/2 0 \ 

V ® Ja//2 / ‘ 


oy 

(15) 


So and U,- are arbitrary orthogonal matrices, and Q is a symmetric permutation. OOO 


To develop a cascade structure which generates such filters, we will assume that we have 
a paraunitary matrix E m _i (z) of order m — 1 satisfying the conditions of paraunitariness 
(1), linear phase (6), and pairwise mirror-image symmetry of frequency responses (10). 
From it, we will show how a paraunitary matrix E m (z) of order m can be obtained satisfying 
the above three properties. We will do this by post multiplying the given matrix E m _i (z) 
by a paraunitary matrix R(z) of order one. 

Let 


E m (z) = E m _ 1 (z)R(z). (16) 

Clearly, E m (z) is paraunitary if E m _i (z) and R(z) are both paraunitary. Also, 

E m _i(z) = E m (z)R(z). (17) 


Propagating the linear phase property : From the fact that E m _i(z) satisfies the linear 
phase property, and substituting from (17), we have 

z—(m—i)DE m(z - 1 )R(z-')j M = E m (z)R(z), (18) 

i.e., 

z" (m - 1) DE m (z- 1 )R(z- 1 )J A/ R(z) = E m (z). (19) 

Hence for E m (z) to satisfy the linear phase property, R(z) should satisfy 

R(z- 1 )Jj W R(z) = z~ 1 Ja/. (20) 


It can be verified that this equation is satisfied if R(z) = A(z)PT ra P with the matrices 
A (z), P and T as stated in the theorem. 

Propagating the pairwise mirror-image property in the frequency domain: Assuming that 
(10) holds for E m _i (z), and using (17) we get, 

J M E m (z)R(z) = E m (z)R(z)Va/, (21) 


i.e., 

J w E m (z)=E m (z)R(z)VA/R(z). 


( 22 ) 


Hence, R(z) should satisfy the property 
R(z)V w R(z) = V M . 


(23) 


We now have two cases. Case 1: M/2 is even, in which case, Vm = [ ), 

M/2 ) 


or Case 2: M/2 is odd, in which case V m 


( V M /2 

V 0 -Vm/2 


0 Y 

7 M/7 ) 


\ 0 Vm/2 

In either case, using the 


fact that R(z) = A(z)PTP with the various matrices as in the statement of the theorem, 
upon simplification we can verify that (23) is indeed satisfied. Thus all three properties; 
paraunitariness, linear phase, and pairwise mirror-symmetry have been satisfied by the 
new matrix E m (z). 


Initialization: It only remains to find a degree zero paraunitary matrix Eo(z) (i.e. a constant 
orthogonal matrix S), which will initialize the above process. It can be verified that the 
matrix S satisfies the linear phase property (DSJm = S), if it is of the form 



/u 


/i a soman 


s = (1/ ' /2 >( » s,)(K -X) Q - (24> 

where Q is a symmetric permutation matrix. This is because QJa/Q = Jm for any such 

permutation matrix. Let Q be so chosen that QVmQ = D, where D = 

Now, let S' = SQ. For the matrix S to satisfy the pairwise mirror-image property (JmS = 
SVa/), it can be verified that the matrix S' should satisfy S'DS' r = Jm- Substituting the 
forms of various matrices and simplifying, we get 


/ Iw/2 0 \ 

\ 0 -I/W/2 / 


( « S„S[ \ / # Jw/2 \ 

( s,sj « ) l Smu o ) 1 ’ 

This equation can be satisfied by letting So be an arbitrary orthogonal matrix, and choosing 
Si = Jm/ 2 Sq- Thus the matrix S can be realized with ( ^ rotations (Mumaghan 


1962). 


Proof of completeness. In this section, we will start with a paraunitary matrix E m (z) of 
order m satisfying the conditions of paraunitariness (1), linear phase (6), and pairwise 
mirror-image symmetry of frequency responses (10). From it, we will show how the matrix 
E m -i (z) of order m — 1 as in (17) satisfies all three of the said properties. For brevity, we 
will only talk about the case M/2 = even, since the proofs for M/2 = odd are identical, 
except for bookkeeping. 

The paraunitariness of E m _i (z) is obvious from (17) itself. 

Linear phase property : From the fact that E m (z) satisfies the linear phase property, we 
have 

Z- m DE m ( Z - I )J M =E m ( Z ), (26) 

i.e., 

z-'”DE m _ 1 ( z - 1 )R(z- 1 )J M R(z) = E m _j(z). (27) 

Noting the form of the matrix R(z) = A(z)PT m P, and its component matrices as men¬ 
tioned in the statement of the theorem, one can verify that the above simplifies to 

2 -(m-i)DE m _j(2 -1 )j M =E m _i(z), (28) 

implying that E m _i (z) has linear phase. 

Pairwise mirror-image property in the frequency domain: Assuming that (10) holds for 
E m (z), and using (16) we get, 

J M E m _! (z)R(z) = E m _i (z)R(z)V m , (29) 

i.e., 

i(z) = E m (z)R(z)VmR(^)- (30) 

Again, from R(z) = A(z)PT m P, and its component matrices as in the statement of the 
theorem, it can be verified that 

R(z)VmR(z) = Vm, (31) 

for both cases M/2 being odd/even. 

Causality: We now have to show that there exists a matrix R(z) of the form R(z) = 
A(z)PT m P such that E ot _i(z) obtained from (17) is causal. The linear phase property. 





/ X 


the pairwise mirror-image property, and the paraunitary property continue to hold for the 
reduced system as long as the matrix R(z), is any orthogonal matrix of this required form. 
Indeed, it is the causality condition on the reduced system which determines the particular 
choice of the matrix R(z). Since E m _i(z) = E m (z)R(z), and knowing the form of R(z), 
we get, 

E m -!(z) = E m (z)PT£PA-‘(z). (32) 

Noting the form of A (z) from the statement of the theorem, we get 

E m _!(z)=E m (z)PT£p( lM f J)+E m (z)PT£p(!| ). (33) 

The second term on the right hand side of this equation is responsible for the non-causality. 
In particular, the noncausal part of the second term is given by 

e»<0)PT>(® d “ /2 ), (34) 

where we have used an expansion similar to (5) for E(z). 

We have to show that there exists a matrix T m of the form in (13) which makes this 
term equal to zero. Let us simplify this term which needs to be made zero. Substituting 
the form of the matrices P and T m from (14) and (13) we have, the non-causal part 


e„(0,^p(J .“J 


— e m(0) 


0 (VM/2Um V M/2 + U^)Jm/2 \ 

o Jm/2(Va*/2U£Vm/2 +U^)Jm/2 / 


Hence we need to show the existence of a orthogonal matrix U m such that 


®m(0) 


(V w/2 U£Vm/ 2 + u£) 
iM/2<y Mfi +u£) 


= 0 . 


(35) 


(36) 


Notice that this matrix multiplying e m (0) has M rows, but M/2 columns. We will now 
proceed to show that such a matrix U m always exists. Firstly, let us examine the above 
equation closely. Given the nature of the matrix V m/2, the matrix that multiplies e m (0) in 
(36) is of the form 


( (V m/2 U£Vm/2-+u£) \ 

V Jm/2(Vm/2U^Vm/2+U^) ) 


where X represent non-zero entries. The matrix thus has a checker-board pattern. Now, 
paraunitary condition in the time domain implies that e m (0)e^(m) = 0. Also, (6) in 
particular means that 


X 

0 

X 

0 

... X 

0 \ 

0 

X 

0 

X 

... 0 

X 

X 

0 

X 

0 

... X 

0 

0 

X 

0 

X 

... 0 

X 

X 

0 

X 

0 

... X 

0 

0 

X 

0 

X 

... 0 

X 

X 

0 

X 

0 

... X 

0 / 


(37) 


De m (0)J M = e m (m). 


(38) 





Hence we have 


e m (0)J M e£(0) = 0. (39) 

By Sylvester’s rank inequality, the above means that rank(e m (0)) = r < M/2. On the 
other hand, (10) in particular implies that JMCm(O) = e m (m)V^. Combining this with the 
paraunitary condition we get 

e m (0)J M VMe£(0) = 0. (40) 

Equations (36) and (37) tell us that we need to prove the existence of M/2 checkered 
vectors that are orthogonal to the rows of e m (0). Equation (39) implies that 

We m (0)Je£(0)W r = 0, (41) 

for any matrix W. For the same matrix W, (40) implies that 

We m (0)JVe£(0)W r =0. (42) 

Let the matrix W be so chosen that the first r rows of the matrix We m (0) form an 
orthonormal basis of real vectors x,- for the rows of the matrix e m (0). (This is possible, 
since the matrix e m (0) is itself real, whose rank is r .) These vectors x,, being orthonormal, 
satisfy xf xj = 0, (i ^ j). Equations (40) and (41) tell us that the same vectors also 
satisfy the conditions xf Jxj = 0 and xf JVxj = 0 for all i, j. 

Each of these vectors x, can be written in terms of its components a, and b,, where these 
two vectors have the checkered form shown below 



Hence the vectors a,- and b, have alternating entries equal to zero, with the entries that are 
zero in a,- being non-zero in b ( , and vice-versa. Moreover, since x,- are real, so are a,• and 
b,- . Then, from (41) we have 

(a, + b,-)J(a/ + bjf = 0, (44) 

and since M is even, this simplifies to 

a, Jbf + b,Ja[ = 0. (45) 

From (42) we have 

(a,- + b;)JV(a y + bj) T = 0. (46) 

But for even M, Va f = a f, and Vb f = —b f, and hence the above equation becomes 

—a, Jbj + bjjaj = 0. (47) 



Jsing (45) and (47), we get a, Jbj = 0,/or all i, j. This means that the rows of the matrix 
\ m (0) are orthogonal to any linear combination of a,-, and any linear combination of b, for 

= 1,..., r. Now choose r linear combinations of a,- which are mutually orthogonal, and 
• linear combinations of b, that are mutually orthogonal. Hence you have 2r orthogonal 
'ectors. Each of them is checkered (since a, and b, are checkered). Denote by e\ the 
ubspace spanned by all the vectors that are checkered and whose odd entries are zeros 
for example, the vectors a,- lie in £i). In this subspace e\ there are {M/2) — r vectors 
vhich are orthogonal to these r vectors a, , since the dimension of the space £\ is M/2. 
limilarly denote by £2 the subspace spanned by all the vectors that are checkered and 
vhose even entries are zeros (for example, the vectors b; lie in £ 2 ). In the subspace £2 
here are (M/2) — r vectors which are orthogonal to these r vectors b,, since the dimension 
if the space £2 is M/2. Using these two sets of vectors along with the original 2r vectors 
;ive us a orthogonal basis of M/2 checkered vectors, such that they are orthogonal to the 
ows of the matrix e m (0). It is clear that the matrix with these M/2 vectors as its columns 
s a checkered matrix of the required form in (37). (It is also easily verified that the matrix 
btained by flipping the rows of the bottom half, and then subtracting the bottom half from 
tie top half of this matrix gives rise to an M/2 by M/2 orthogonal matrix, which we will 
all in (36). Hence we have shown that there always exists a matrix T m of the form 
aentioned in the statement of the theorem, such that the reduced system is causal. 

Irder reduction : Given the fact that E m (z) is causal, and that it satisfies (6), we can see 
rat the order of E m (z) is m. Thus there is a reduction in order by 1. Hence for a system 
f order N, the factorization process is guaranteed to terminate in N steps. 

This concludes the proof of the theorem. OOO 


In the above theorem, each orthogonal building block can be realized with 
ngles (variables). 

The degree of a causal rational system is defined as (Vaidyanathan 1993) the minimum 
umber of delays required for its implementation. A structure is said to be minimal if the 
umber of delays used is equal to the degree of the transfer function. For a paraunitary 
/stem, we know that (theorem 14.7.1 of Vaidyanathan (1993))’ 

deg[det[E(z)]] = deg[E(z)]. (48) 


(T) 


i our case, 

deg[det[E(z)]] = deg[det[SA(z)PT 0 A(z).A(z)T*P]] = NMf 2, (49) 

'hich is equal to the number of delays used. Hence our factorization is minimal. 


. Conclusions 

1 this paper, we proposed and proved a factorization for linear-phase paraunitary filter 
anks whose filters have pairwise mirror-image symmetry in the frequency domain. In 
paraunitary system, the analysis and synthesis filters are time-reversed versions of one 
nother. Hence, both, the analysis and synthesis banks can be factored in terms of this 
xucture. The filter banks realized using this factorization have the following properties: 

• The filter bank is paraunitary, and therefore gives perfect reconstruction. 

• The analysis and synthesis filters are time-reversed versions of each other. 
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• The analysis and synthesis filters are all linear phase. 

• The filters in the analysis and the synthesis banks both satisfy the pairwise mirror- 
image property in the frequency domain. 

It is interesting to note that the linear phase property along with the paraunitary condition 
implies that the analysis and synthesis banks are identical, up to a multiplier of ± 1 on some 
of the filters, i.e., Fi(z) = ±£fKz). This property is useful because, now, the same filter 
bank can be used for analysis and synthesis. This is useful in applications that involve 
encoding and decoding to be performed by the same system. 

The variable matrices in our system are shown to be orthogonal matrices of a certain 
form. These orthogonal matrices can be characterised in terms of a fixed number of rotation 
angles. These rotation angles are actually the variables in the design process. Our factor¬ 
ization also gives a lattice structure for implementing these filter banks. If these angles 
are made the multipliers in the implementation, it gives rise to a structure that is robust 
to multiplier quantization. The properties of paraunitariness, linear phase, and pairwise 
mirror-image symmetry are retained no matter what the values of the angles are. 
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Wavelet packet based channel equalization 
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Abstract. Recently, considerable amount of attention is being given to the 
field of wavelets and wavelet packets. It has found numerous applications in 
signal representation, image compression and applied mathematics. 

In this paper, we present a channel equalization method based on wavelet 
packets. The proposed equalizer structure is based on the fact that for sufficiently 
narrowband sequences, a non-ideal channel can be modelled as an attenuation 
and delay. If the data sequence is used to modulate a set of narrowband wavelet 
packets, then no equalization is required at the receiver end. The equalization 
problem reduces to that of determining the delay introduced by the channel for 
each of the wavelet packets. A minimum square variance algorithm for adap¬ 
tively choosing the delay has been proposed. This algorithm has been shown 
to perform as desired analytically in a simple delay channel case. Simulations 
have been used to study its performance in the non-ideal channel’s case and the 
results corroborate theoretical predictions. 

Keywords. Channel equalization; wavelets and wavelet packets. 


1. Introduction 

Practical communication channels are noisy and band-limited. Hence, the received se¬ 
quence is usually an attenuated, delayed and distorted version of the transmitted sequence 
(besides the noise introduced by the channel). When a stream of symbols is transmitted 
over the channel, the distortion results in interference between neighbouring symbols. This 
inter-symbol interference (ISI) is primarily due to the band-limited nature of the channel. 
A filter or signal processing algorithm, called an equalizer, is required at the receiver end 
to remove (or minimize) the effects of ISI. The parameters of the equalizer are adjusted 
on the basis of measurements of the channel characteristics (Proakis 1983; Qureshi 1985). 
These measurements could be made by initially transmitting a training sequence which 
is known to the receiver. Alternatively, in the blind equalization schemes (Benveniste & 
Goursat 1984), measurements made on the received sequence itself are used to estimate 
the channel characteristics. 
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The complexity of the equalizer is substantial for channels with severe ISI. To reduce 
the complexity of the receiver, the data symbols are used to modulate a narrow-band carrier 
which is then transmitted over the channel. Since a channel behaves like an ideal delay 
channel in a sufficiently narrow band, the narrow-band carrier suffers much less distortion 
thereby requiring reduced compensation and reduced equalizer complexity. However, a 
single narrow-band carrier would use only a fraction of the channel bandwidth available. 
This would mean transmitting the data at rates much lower than is possible. There are two 
complementary approaches to increase the data rate for a given bit error probability. 

The first approach to increase the data rate is to use multi-level amplitude (M-ary) 
modulation of the carrier. The carrier takes M possible signal amplitudes, corresponding 
to M = 2 k possible ifc-bit symbols. The increase in data rate, by a factor of M, is gained at the 
expense of increased signal power; the bandwidth utilization is still the same. Quadrature 
amplitude modulation is an efficient method to trade-off data rate against signal power. 

The second approach to increase the data rate is to use multiple carriers, each occupying 
different regions of the channel bandwidth. Here, the increase in data rate is gained at 
the expense of greater bandwidth utilization. An orthonormal set of carriers would, in 
general, offer the best performance. A number of orthonormal sets have been suggested in 
the literature. The discrete multi-tone (DMT) (Chow 1992, ch. 2-4) system, for example, 
uses the Fourier basis sequences as the orthonormal set. 

Recently, considerable amount of attention is being given to the field of wavelets and 
wavelet packets. Wavelet theory provides a unified framework for a number of signal 
processing techniques which have been independently developed. It has found numer¬ 
ous applications in signal representation, image compression and applied mathematics 
(Coombes etal 1989; IEEE 1992). 

Whereas the Fourier basis sequences are all of equal bandwidth, wavelet packets are 
a generalization to the unequal bandwidth case. Here, we present a channel equaliza¬ 
tion method based on wavelet packets. The ability to select the bandwidth of the carri¬ 
ers could conceivably be used to improve the efficiency of the DMT system, though, of 
course, the DMT system has the advantage of having a number of fast algorithms for its 
implementation. 

2. Problem statement 

The problem of designing an equalizer and then adaptively choosing the equalizer param¬ 
eters is a classic one. Recently, with the development of wavelets and renewed interest in 
multirate systems, a number of adaptive equalization algorithms using sub-band concepts 
have been proposed (Gilloire & Vetterli 1992; Shynk 1992; Sathe & Vaidyanathan 1993). 
These algorithms are based on splitting the output signal of the channel into sub-bands, 
applying standard adaptive equalization algorithms in each sub-band and then recombining 
the sub-bands to generate the equalized output. The sub-band scheme has greater com¬ 
putational efficiency than the full-band scheme. Furthermore, the convergence speed is 
improved as the adaptation step size can be matched to the energy distribution of the input 
signal in that band. However, if decimation is done close to the maximal rate in an attempt 
to reduce the number of computations, then the performance deteriorates. 

Here, we approach the problem of channel equalization using wavelet packets in a differ¬ 
ent way. Any channel response can be sub-divided into a set of regions (possibly unequal), 
where its behaviour closely approximates the ideal delay channel. Since a wavelet packet 
is essentially a narrow-band sequence, a suitably designed packet would be essentially 
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undistorted by passage through the channel If a data bit is used to switch the polarity of 
the packet, then at the receiver the bit could be recovered by a simple matched filter-sampler 
combination. 

Since wavelet packets can be designed with finite support, a data sequence could be 
transmitted over the channel by using time-delayed (shifted) versions of the same wavelet 

ife- 

^ packet. If the delay between two successive wavelet packets is sufficient, the overlap 
between them is minimal, and at the receiver the data sequence can easily be identified and 
recovered. However, to increase the data rate, one would like to reduce the delay between 
the transmission of successive wavelet packets to a minimum. But, as one reduces the 
delay it becomes increasingly difficult to pick the correct sample at the matched filter 
output. This is because of the increased overlap between the wavelet packet and its shifted 
versions. 

An important property of a wavelet packet is that it is orthogonal to an n*-shifted version 
of itself (where n* is the decimation factor associated with the wavelet packet in the kth 
sub-band). Thus, if we reduce the delay between successive wavelet packets to n*, this 
property can be exploited by the receiver to recover the transmitted sequence, despite 
^ large amounts of overlap. Picking the correct sample is equivalent to estimating the delay 
introduced by the channel before decimating the matched filter output. Thus, if we use the 
data sequence to modulate a set of wavelet packets and its shifted versions, no equalization 
is required at the receiver end. A simple delay and matched filter combination followed 
by a decimator would suffice. The equalization problem reduces to that of determining the 
delay introduced by the channel, for each of the wavelet packets. 


3. Proposed equalizer structure 



The wavelet packet transform (IEEE 1992; Mathiarasan 1992) is a generalization of the 
Discrete Time Wavelet Transform (DTWT). The transform coefficients for the sequence 
x(n) are given by 

4*oo 

Xk(n)= ^2 x(m)hic(nkn - m),k = 0, 1,..., M - 1, (1) 

m = —oo 


where the nk ’s are arbitrary positive integers which satisfy 


M -1 i 


( 2 ) 


and the filters hk (n), k = 0,1,..., M — 1 form the analysis bank of a non-uniform perfect 
reconstruction (PR) system. The decimation factor associated with the /cth sub-band is n* 
and the bandwidth of the kth sub-band is nominally n/rik. The non-uniform filter bank 
can be generated by cascading uniform filter banks together in an arbitrary tree structure. 

Similarly, the inverse transform relation is given by 


M— 1 4-oo 

x(n ) = E E Xk(m)fk(n-n k m). (3) 

k = Qm = -oo 

The filters fk(n), k = 0, 1,..., Af — 1 form the corresponding synthesis bank of the PR 
system. 



78 


S Gracias and V U Reddy 


For perfect reconstruction, the analysis and synthesis filters have to satisfy the following 


conditions. 


fk(n) = h k (-n), 

(4) 

and 


+00 

Y fk( n )fm(n - n k,mP) = S ( k ~ ™) s (p)> k,nt= 0, 1,. 

M — l, (5) 


n = —oo 


where nk, m = gcd (n k , n m ). 

The transform can also be interpreted as. a projection of the sequence onto a set of 
orthonormal basis sequences r]km(n), where 

rjkmin) = fk(n ~ n k m), k = 0,1,. .., M — 1, and m e Z. (6) 

T his orthonormal set of basis sequences is used as the "carrier" set for the data sequence. 
Before modulating these wave packets with data bits, we first split the data sequence into 
M sub-sequences. Since the bandwidth of the kth wavelet packet is inversely proportional 
to nk, the bits allocated to the kth wavelet packet should also be inversely proportional 
to nk . For example, in the uniform case, we could split the sequence a(n) into blocks of 
length M and assign the kth element of every block to the kth sub-sequence. 

The problem reduces to that of transmitting M sub-sequences {ak(n)}^~Q on the chan¬ 
nel. Each of these sequences could be used to modulate a set of wave packets { r)k m (n))mez. 
to generate the transmitted sequence t(n) as follows 

M- 1 +oo 

t(n) = E E ak(rn)r]k m (n). (7) 

k = Om = -oo 

Note from (7) that the mth bit of the kth sub-sequence of the data modulates (i.e., multiplies 
with the amplitude of the bit) the mth wave packet of the £th set. 

Combining (6) with (7), we get 

M-\ +oo 

t(n) = E E ak(m)fk(.n-n k m), (8) 

k — 0 m = -oo 


or in the z domain 


M -1 M—\ 

T(z) = £ A k (z nk )F k (z) = Y, (Ak(z)hn k F k (z). (9) 

*=0 *=0 

That is, the modulation can be performed by passing the M sub-sequences through a bank 
of expanders followed by a synthesis bank as shown in figure 1. 
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Figure 1. A wavelet packet based equalization scheme: Transmitter section. 
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Figure 2. 


Noisy channel model. 


The modulated signal t(n) is then transmitted over a noisy channel (see figure 2). To 
demodulate the received signal r(n), we first pass it through a bank of matched filters. From 
(4), the matched filters are simply the corresponding analysis filters. Before decimating the 
output of the matched filter by a factor n;,we need to compensate for the delays experienced 
by the different wave packets (i.e., the different carriers). The receiver structure consisting 
of the matched filters, delays and decimators (see figure 3) performs the role of equalizer 
here. Figure 1 is called a transmultiplexer (Vaidyanathan 1993) as it converts a TDM signal 
into FDM, and vice-versa. We will now consider the problem of determining the delays at 
the receiver end. 


4. Analysis of the equalization scheme 

Consider the block diagram of figure 1. The z transform of the transmitted signal, T (z), is 
given by 


m -l m -l 

T(z)= J^(A k (z)hn k F k (z)= Y, Ak(z nk )F k (z). 
k =0 k =0 

The z transform of the received signal, R{z), is given by 
R(z ) = T(z)C(z) + Q(z). 

After equalization in the ith branch, we have 

Mz) = (R(z)Hi{z)z - Si \ ni ,0 < Si <m - 1. 

Using (10) and (11) 

(M -1 \ 

Mz) = Y A k (z nk )F k (z)C(z)Hi(z)z- Si + Q(z)Hi(z)z~ Si . 

U=0 Ab¬ 

using (4), we get 

/M -1 \ 

A<(z)= Y A k {z nk )H k {z- x )C(.z)Hi(z)z- &i + Q(z)Hi(z)z~ Si 

\k = 0 / 4_n, 



( 10 ) 

(ID 

( 12 ) 

(13) 

(14) 


Figure 3. A wavelet packet based equalization scheme: Receiver section. 















80 


5 Gracias and V U Reddy 


To simplify the notation, we define 

S k i(z) = H k (z- l )Hi(z), (15) 

D ki (z) = S ki (z)C(z). (16) 

This gives 

M-\ 

Ai(z) = 23 <■ Ak(z nk )D ki (z)z- Si )m i + (Q(z)H i (z)z- Si )in i (17) 

* = 0 

or in the time domain 

M~\ 4-oo 

a;{n) = 23 21 a k (m)dki(nin - n k m - 5,-) 

& = 0 m — —00 @ 

+00 

+ 23 hi(m)q(rijn - m - 8i). (18) 

m = —oo 

In the noiseless case, (18) reduces to 
M- 1 +00 

a;(n) 2Z a k {m)dki(riin - n k m - <$,). (19) 

k — Qm = -oo 

Thus, the output in the i th branch is not a delayed version of the input even in the noiseless 
case. This is due to the interference between samples of the same branch signal as well 
as the interference across branches. We should choose 5,- such that this interference is 

minimized and the output diin) is mapped to a delayed version of a,(«). The minimum 

square variance (MSV) algorithm developed below meets this objective. We will motivate 
this algorithm for the noiseless case. 

5, Motivation of minimum square variance algorithm 

Consider (19). If ^,-(n,-n - n k m - 5;) = 0 for k ^ i, the interference across branches 
will be zero. Further, if d,, (n;n — n,m — Si) is a delta sequence, the interference between 
samples of the same branch will be zero. Now the question we ask is the following. Will 
an appropriate choice for <5,- force the above mentioned conditions on the d k i(-)l To see 
this, we first investigate the properties of d k i (•)• 

We begin by noting that, using Euclid’s identity, we can make the substitution — 
n k m = n k jp, where p is some arbitrary integer and n k j = gcd(nfc, n, ). Thus, instead of 
investigating the properties of d k i(iiin—n k m — Si), we look at the properties of d k i (n k jm — 
Si). Using Parseval’s theorem and the fact that the wavelet packets and their shifted versions 
form an orthonormal set, it can be shown that (Gracias 1994) 

M -1 +oo i »2k 

23 23 d 2 ki (n u m - Si) = — / \C{ej e )Hi^ 6 )\ 2 de. (20) 

k = 0 m = —oo ^ •'O 

We note from (20) that Efjo Em = -oo d h ( - nk - , m — Si) is independent of 5,- and is equal 
to the channel energy in the portion specified by the f th branch. 

Now, squaring the LHS of (20) gives 

( M— 1 +oo \ 2 M -1 +oo 

13 H dfr(n k jm — <§;)) = 23 21 4 t {n k jm-Si ) 

k = Q m = -oo / >t = 0 m = -oo 

+positive cross-terms. (21) 
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Since each term in the RHS of (21) is positive, we get the following inequality 

M -1 -foo /M— 1 -foo \ ~ 

Y Y 4i(nkjm-8i)<lJ2 J2 d kMk.i™ - S/)J . (22) 

The equality in the above equation holds only if all the positive cross terms are zero. This 

can happen only if exactly one of the terms (i.e., <4/ (•)) of the sum is non-zero (the trivial 

case of the equality when all the terms are identically zero is not permissible since the 
RHS of (20) is guaranteed to be positive). If the wave packets are chosen with a small 
overlap in the frequency domain, then Hk(z~ l )Hi(z ), i i=- k will be close to zero. This 
implies from (16) that dki(n ), i ^ k will be close to zero. Thus the above equality holds 
only if dainjcjm — <$/) is a delta sequence and dki(nkjm — 5/), i ^ k is identically zero. 
If dii(nim — Si) is a delta sequence, then the output sequence is a delayed version of the 
input sequence, i.e., there is no ISI. 

Note from (20) that the RHS of (22) is independent of 5/. This suggests that if we choose 
8{ to maximize the LHS of (22), then the sequence dki(nkjm — 5/) will approximately 
assume the properties mentioned above, thereby minimizing the ISI. 

In practice, the channel response is unknown, and hence, <4/(0 is unknown. Thus, we 
have to make the appropriate choice of £,• based on the output signal <5/ (n) and its statistics. 


6. Minimum square variance algorithm 


In the previous sections, we have shown that to minimize the ISI, we should choose <5/ such 

that Efjo £m = -oo d ti ( num-Sj) is maximized. Since Y^kZo T,m = -oo d li frum-St ) 
is independent of <$/ (see (20)), we can rewrite 


max 


mm 


M-\ +oo \ 

E E d£i(n k jm - Si) 1 w.r.t. S,- O 

'( = 0 m — -oo / 

W-I +oo \ 2 M—\ 

Y Y d ki( n k,i m -<$/)) - Y Y d kMjm-Si) \ w.r.t. <5,-. (23) 


/M -1 +oo M -1 -foo 

Y Y d kMkjm - Si) ) ~Y Y 

\k = 0 m = —oo ! k=0m = -oo 

From the statistics of a,- ( n ), we can show that 


( /M -1 +00 y 

( Y Y d ki( n kJ m — Si) ) 

U = 0^ = —oo / 

M— 1 +oo 

- Y Y d ki( n kjm - Si) 

k — Om = —oo 

From (24) and (23), we get the following relation 


(24) 


maximize 


(M -1 +oo \ 

E E d ki(n k jm — Si + 1) I w.r.t. Si <£> 

\k = 0 m = -OQ / 


minimize (v 2 u[af (n)]) w.r.t. <5/. 


(25) 


Thus if we choose 5/ such that var [af(n)] is minimized, then the ISI will be minimized. 
We call the algorithm which performs this minimization as the minimum square variance 
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Figure 4. Block diagram of the MSV algorithm in the /-th branch. 


(MSV) algorithm. The block diagram of the MSV algorithm in the /th branch is given in 
figure 4. 

The output of the /th receiver filter is decomposed into its n, polyphase components. 
At every «,th instant, the sample variance of each of these components is updated. The 
polyphase component with the smallest sample variance is declared as the output of the 
branch. We can re-state the algorithm formally as follows: 

For each branch at the receiver, 

• split the filter output into n; length blocks, 

• setup n/ registers to hold the sample variances, 

• initialize these registers to zero, 

• use the kth sample of the block to update the variance in the &th register, 

• declare the kth sample of the block as the desired received output of the branch, if the 
value of the kth register is minimum. 

7. Performance of the MSV algorithm in a simple delay channel case 

We will now explore how the MSV algorithm performs in the simple case of a noiseless 
delay channel. Assuming, 

C(z) = z~ r (26) 

and using (16), we get 

D ki (z) = z~ y S kt (z). (27) 

Substituting (27) into (17) with Q(z) = 0 gives 

M—\ 

Ai(z)=Y.^ A k(z nk )S k i(z)z- y - Si )^ r (28) 

* = 0 


Suppose we choose <$/ such that, 
~Y - Si = -pni 


( 29 ) 
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where p is some arbitrary integer (note that this is always possible as 0 < 5/ < — 1). 

Using the Noble identities, (28) reduces to 


Af—1 


Now 


Aiil) — (A] c (z nk ^ nk ' l )z P ni ^ nic ' 1 (Ski(z))lnki )lnt/nj c j • 
ife = 0 


(Ski(z))in kJ = (H k {z- X )Hi(z))in k ,r 


(30) 


(31) 


The LHS of the above equation is just a rewriting of the orthonormality condition in the 
z-domain. Thus, we have 


(Ski(z))in kJ =S(k,i). 
Using (32) in (30), we have 


(32) 


M -1 


A,(Z)= X (. A k (z nk/nk - i )z-<’ ni/n “8(k,i)) lni/nk , i 
k = 0 

= (A i (z ni/ni ’ i )z- pni/ni - i ) i n i /n u 
= A t (z)z~ p . 


(33) 


Thus, if we choose <$,• according to (29), then the output is an undistorted version of the 
input. 

To see what the MSV algorithm gives, substitute (27) in (24). This gives 

/ /M -1 +00 \ 2 

var[a; 2 (n)] = 2+ ! I X X $*,•(«*,/'» — K — 5/) j 
\\ k = 0 m = -oo ! 


M— 1 +oo 

X X 4 l( n kjm - y - Si) 

k — Om — —oo 


(34) 


If 5,- satisfies (29), then 

var[a,- 2 (n)] =2a 4 


( M-l +oo 

E E 

k = 0 m = -o 


slMkjm — nip) 


M —1 +oo 

X X! ^(nkjm-mp) | . 

k — Om = ~oo 


(35) 


Making the substitution l = m — rii/n k j p in (35), we get 


var[a,- 2 (n)] = 2a 4 
Now, using (15) and (5), 


( M -1 +oo 

E E 

k = 0 1 = —ex 


M-1 +oo 


4(»wO- -XX (36) 

J k = 0 l = -oo 


+00 

s k i(n u l)= X fk(n)fi(n -n k jl) 
n = -oo 

= S(k — i)S(l), k, i = 0, 1. M - 1. 


(37) 



From (37), we obtain 


( /M—l +00 

E E (S(k - Dsn)) 2 

\k = 0 / = —oo 

M— 1 +oo 

-EE (S(k - i)S(i)) A 

k = 0 l — -oo 

= 2or 4 (l - 1) = 0. (38) 

Thus, for 8i satisfying (29), the variance of af(n) is identically zero. For all other values 
of 8i, the variance would be non-zero as (37) can no longer be applied. We can therefore 
conclude that for a simple delay channel (with no noise), the MSV algorithm will yield 
the proper value of the delay. 

7.1 Computational issues 

Suppose the data sequence is arriving at a rate B (with respect to some system of units). 
At the transmitter end, the sequence is split into M sub-sequences. The ith sub-sequence 
will be at the rate B/n-,. The sub-sequences are passed through an n, -fold expander. Thus, 
the filtering operations have to be performed at the rate B. Since the bandwidth of the 
ith filter is inversely proportional to i, we can assume that its length is «, L. Thus, the 
computation rate in each branch is approximately n, L5. Thus, the total computation rate 
at the transmitter end is approximately ^,+ 0 * n iLB. However, if we use the polyphase 
representation to implement the n, -fold expander and filter cascade, then we can reduce 
the rate in each branch by a factor n,- (Vaidyanathan 1993). This makes the computation 
rate for the transmitter section approximately MLB. 

The computational rate at the transmitter can be further reduced if the filter bank is 
implemented using a tree structure. For example, consider the uniform case, with M = 2 P . 
Instead of implementing the filter bank as a set of A/-fold expander-filter combinations, 
we could implement it as a cascade of p stages of 2-fold expander-filter combinations. 
The filter lengths would be ML and 2 L respectively. This would reduce the computation 
rate iolpLB, which compares favorably to FFT-based schemes. 

The sequences are then combined and transmitted over the channel at the rate B. As in 
the case of the transmitter the total computation rate for demodulation is approximately 
22,+0 n i C B. The polyphase representation cannot be used here as all the polyphase 
components of the received signal are required to make the decision. 

Since Y^k=0 22m = -oo ^jfc/( ra *.i m — $/) * s independent of <$;, minimizing the variance 
of af(n) is equivalent to maximizing the fourth moment of 2* (n). Updating of the sample 
fourth moment requires 4 multiplications and 1 addition operation. If we consider only the 
multiplications, the computation rate is 4 MB since there are n ; polyphase components in 
each branch, and these components are arriving at a rate of B/n,. Thus the total computa¬ 
tion rate at the receiver is (4 M + n iL)B. Once the algorithm converges, only the 
polyphase component corresponding to the selected delay has to be computed. This brings 
down the computation rate at the receiver to MLB. 

The equalization scheme has been proposed for a stream of bits. At the transmitter, 
we map the bits 0 and 1 to the levels +a and —a, respectively, to generate the input to 
the equalizer. Similarly, at the receiver the received signal (after equalization) has to be 
mapped back to a bit stream. This can be done using an appropriate threshold. 


rru-KCtct i/uj cu ci 


OJ 


The equalizer scheme is based on the fact that the channel can be approximated by a 
simple delay in a sufficiently narrow band. For the M branch (uniform case) equalizer, each 
wavelet packet has a bandwidth of jr /M. Clearly, if we increase M, the approximation 
gets better. However, we have seen that the computation complexity per sample is 0(M 2 ). 
Thus, increasing M imposes a heavy computational burden on the system. 

The design of appropriate wavelet packets is equivalent to the design of a non-uniform 
PR filter bank. This can be accomplished by cascading appropriate (uniform) paraunitary 
filter banks. A design technique for such banks, based on cosine modulation (Koilpillai 
& Vaidyanathan 1992), requires the desired length of the filters and a cost function (to be 
minimized) as design parameters. Recall from the previous sections that the inter-branch 
interference in the i th branch is small if d*,- (n*,- n — 5,-), i k is close to zero. In the absence 
of any information about the channel, we could use the cost function X^^°-co s^(n) for 
the design of the wave packets. This will ensure that dki(n), i k is small (see (16)), and 
hence, the inter-branch interference using the designed wave packets is small. 


8. Simulations 

In the simulations, we considered the uniform case, i.e., — M,i =0, 1,...,M — 1. 

The input was a random sequence taking values +1 and -1 with equal probability. We first 
consider the case of a simple delay channel, C(z) = z ~ 1 . The received signal for the two- 
branch equalizer with compensating delays of both zero and one is shown in figure 5. It is 
clear from the figure that the output depends critically on the delay chosen. A wrong choice 
of delay would result in a wrong decoding of the received signal. This clearly illustrates 
the problem of picking the correct sample at the output of the matched filter. The delay 
computed by the MSV algorithm for this case is shown in figure 6. 


I 

f 


••a 


4 




(b) 




(c) (d) 


Figure 5. Received signals for a simple delay channel with a two-branch equalizer 
with 6/ = 0 and 1. (a) i = 0,6/ = 0. (b) i =0,6/ = 1. (c) i = 1,6/ = 0. 
(d) / = 1 , 6/ = 1. 
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Figure 6. Delay computed by the MS V algorithm for the two-branch equalizer for 
a simple delay channel in the noise-less case, (a) i = 0 . (b) i = 1 . 


To study the effect of noise, we consider the simple delay channel with zero mean white 
Gaussian noise for the two-branch case. The delay computed by the MS V algorithm for the 
noisy case with SNR’s of lOdB and 5dB are shown in figure 7. We note that the algorithm 
takes longer time to converge, but the converged value of the delay is unaffected by noise. 
The convergence of the fractional error for the noiseless case and for the noisy case with 
SNR’s of lOdB and 5dB is shown in figure 8. Note that the noise affects the decoding to a 
bit stream and hence the steady-state error. 

To test the wavelet packet based equalization scheme and the MSV algorithm, simula¬ 
tions were carried out using the three channels (denoted by A, B and C, respectively) with 
impulse responses as shown in figure 9. 
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Figure 7. Delay computed by the MSV algorithm for the two-branch equalizer for 
a simple delay channel in the noisy case, (a) i = 0, (b) i = 1 (SNR = lOdB). 
(c) i = 0, (d) i = 1 (SNR = 5dB). 
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Figure 8. Fraction of bits in error for a delay channel for a two-branch equalizer 
with additive noise, (a) i = 0. (b) i = 1. 


The fractional error (the fraction of bits in error, i.e., the number of bits in error upto the 
nth instant divided by the total number of bits received upto that instant) at the receiver 
output is shown in figure 10 for a typical two-branch equalizer in the case of the three 
channels A, B and C. Note that the steady state error is minimum in channel C and 
maximum in channel B. This implies that channel B causes maximum 1ST This is evident 



(b) (c) 

Figure 9. Impulse responses of three typical channels, (a) Channel A. (b) Channel 
B. (c) Channel C. 
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Figure 10. Fraction of bits in error for a typical two-branch equalizer for different 
channels, (a) i = 0. (b) i = 1. 


from the impulse response of channel B, where two coefficients have nearly the same 
amplitude. 

The effect of increasing the number of branches is shown in figure 11, which gives the 
plot of the fractional error in the zeroth branch for equalizers with two, three and five 
branches, respectively. The steady-state fractional error is lowest in the five-branch case. 
This is because in the five-branch case, each wavelet packet occupies a comparatively 
smaller bandwidth and hence is relatively undistorted. 
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Figure 11. Fraction of bits in error for the zeroth branch of an M-branch equalizer. 
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K Conclusions 

"he paper addresses the problem of channel equalization using wavelet packets. Wavelet 
•ackets were introduced as a natural generalization of the orthonormal basis sequences 
sed in Fourier analysis, namely, the sinusoids. The proposed equalizer scheme was based 
>n the fact that wavelet packets, being narrow-band, suffer only attenuation and delay 
/hen passed through a non-ideal channel. 

The equalizer structure, comprising a wavelet packet modulator, a compensating delay 
nd a matched-filter demodulator was shown to be easily implementable in terms of filters 
nd other multirate components. An algorithm to choose the delay values adaptively for 
ach wavelet packet was motivated using the inherent multirate and orthonormal properties 
f the wavelet packet set. The algorithm (called the MS V algorithm) uses the variance of 
le square of the received sequence to choose that value of the compensating delay which 
linimizes the 1ST 

Simulations were carried out to test the equalizer structure and the MSV algorithm, 
he two-branch equalizer was tested for different channel models and also for various 
oise levels. The algorithm converges rapidly, and the steady state fractional error (which 
symptotically becomes the probability of error) is low except for cases where the channel- 
enerated interference is pronounced. It is demonstrated that such cases can be tackled 
y increasing the number of branches in the equalizer, i.e., by appropriately selecting the 
avelet packets. 
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Code design using deterministic annealing 

BINOY JOSEPH and ANAMITRA MAKUR 

Department of Electrical Communication Engineering, Indian Institute of 
Science, Bangalore 560 012, India 

Abstract. We address the problem of designing codes for specific applica¬ 
tions using deterministic annealing. Designing a block code over any finite 
dimensional space may be thought of as forming the corresponding number of 
clusters over the particular dimensional space. We have shown that the total 
distortion incurred in encoding a training set is related to the probability of 
correct reception over a symmetric channel. While conventional deterministic 
annealing make use of the Euclidean squared error distance measure, we have 
developed an algorithm that can be used for clustering with Hamming distance 
as the distance measure, which is required in the error correcting scenario. 

Keywords. Error correcting code; Hamming distance; deterministic anneal¬ 
ing. 


1. Introduction 

One property of error correcting codes that do not incorporate memory is that it should 
have the region around each of the codewords as large as possible, so that a larger number 
of errors can be corrected. Figure 1 shows a geometrical view of some codewords with the 
corresponding non-overlapping regions, where points denote possible received vectors. At 
the same time we would also like to have the number of codewords in the codebook as 
large as possible, so that more input messages can be encoded using the code. The above 
two requirements are obviously contradictory in nature. The codewords used are vectors 
of a particular dimension over some alphabet q. If there exists a Galois field over q (i.e. if 
q is a prime or a prime power) then efficient families of codes are known which make use 
of polynomial arithmetic, and strict upper bounds to the number of possible codewords 
exist. For q’s where we do not have a Galois field we do not know many codes there are 
and the maximum number of codewords that we can have is also unknown. 

A possible method to form error correcting codes over non-prime q’s would be to make 
use of some sort of clustering algorithms and pick out the codewords as the cluster centres. 
Efforts to make use of clustering for designing codes can be found in El Gammal et al 
(1987). Figure 1 may also be viewed as the outcome of a clustering algorithm, showing 
some clusters, cluster centres, and input data points. In the case of designing a code our 
input data is going to be the set containing all possible g—ary n— tuples. While designing 
codes, our objective would be to form homogeneous clusters of the required size over 
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Figure 1. Geometrical view of a 
code/cluster. 


q— ary n— tuples, such that a code with a given rate performs the best (for some channel). 
Clustering algorithms usually minimize the total distortion (average distortion) incurred 
if the representative vectors were used to encode the given data. It will be shown in the 
next section that if the distance measure used is the Hamming distance then the above two 
objectives are equivalent. 

For example, consider the binary [7,4] Hamming code. If we look at it geometrically 
we see that it consists of 16 codewords which may be considered as the representative 
vectors of 16 different clusters. Each cluster consists of eight binary 7-tuples which are 
at a Hamming distance of 0 or 1 from the respective codewords. The clusters are non¬ 
overlapping and hence any single error can be corrected at the receiver. The distortion 
turns out to be a minimum if we use the Hamming code to encode all possible binary 
7-tuples. If any other 16 vectors are used for this purpose, it would give a higher total 
distortion. Thus a direct relationship exists between minimum distortion clustering and 
code design, and we see that codes may be designed using the same philosophy as used 
for generating clusters from a given set of input data. 

Many of the clustering algorithms use real arithmetic in forming the clusters. If we use 
algorithms which use real arithmetic for code design, then we will have to translate them 
back to the finite alphabet. Generally, for non-binary cases this will not be easy as there 
exists no simple relationship between the distance measure in real arithmetic and that in 
the finite alphabet. So we need some algorithm which can do clustering in a finite alphabet 
with Hamming distance as the distance measure. 


2. Performance analysis of codes designed using clustering 

We show that there exists a direct relationship between the total Hamming distortion 
incurred if the codewords were used for encoding the particular dimensional space to 
which they belong and the probability of error while decoding at the receiver. Transmission 
over a g-ary symmetric channel is assumed. To do so, consider the following example. 
Let co, c\, • • ■, cjv_i be the N codewords in the code and let so, sj, • • •, s/v_i be the 
corresponding clusters. By clusters about a set of points in a space, we mean the partitioning 
of the space according to the nearest neighbour rule. Now define 

d(sj, q ) = ]T d H (xj,Ci ) 

XjdSi 


( 1 ) 
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where dn (*/, q) is the Hamming distance between any two vectors xj and c,. The set of 
all possible received vectors consists of all possible q —ary n— tuples. Let 

N -1 

Dtot = d (si , C [) ( 2 ) 

1=0 

be the total distortion incurred if the set of codewords were used for encoding the space of 
all possible received vectors. Let r be the received vector when q is transmitted. If r € $/ 
i.e., if r belongs to the cluster around c;, then there is no error, else an error occurs while 
decoding. Now, 

Pi(r is decoded correctly) = P(r e Sf) 

= ^P(rlci) 
rest 

= L - Pe) n ~ dli(Ci ' r) 




dfi (c,-,r) 


■■ (1 - Pe) 


where p e is the probability that a bit would be in error for a binary symmetric channel and 
P(r\ci) is the conditional probability that r is received given that c,- was transmitted. Now 
probability of no error is given by 

j N-l 

p corr = — y Pi ( r is decoded correctly) 

KT L — d 


(1 ~ Pef "ft y. / Pe \ 
N Aoffevi -Pe) 

(1 “ PeY yh u ( P e V 

h J It =f.) 


n N-l 


dfi ( r >Ci) 


(1 - PeY 


(A) 


with the assumption that all N codewords are equiprobable, where n is the codelength and 
B(p) = bop 0 + b\p 1 H— * + b n p n . (5) 


N-l 

»i=Z E 1 

T = 0 r€si,dn(r,Ci) = j 

= | {r:r€s t and dn(r,Ci) = j}\ (6) 

i.e., bj’s are the number of received vectors that are at a distance of j from a particular 
codeword c, and fall within the cluster s,. Hence performance of the code is given exactly 
by (8) and assuming that p e « 1 it can be approximated as 

Pcorr — — (^OPe +b\p] + &2Pe 4 - f b„p") 

= ^B(p e ). 


(7) 
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Now, the total distortion ( D tot ) incurred if the codewords were to be used for encoding all 
possible received vectors is given by 

D to t = Obo + l^i + 2b 2 + * • * + nh n 

= 4~B(p) lp=i- (8) 

dp 

From the above two equations for D tot and P CO rr it is clear that there exists a relationship 
between the total distortion and the probability of error. 

Consider again the example of the binary [7,4] Hamming code. Here we have N = 16, 
bo = 16 (16 vectors have a distance 0 from some codeword), b\ = 112 (remaining vectors 
have a distance 1 from some codeword) and £>2 = • • • = bj = 0. Hence from (8) we find 
that D tot = 112. Also P CO rr = (1 — Pe) 1 + 7(1 — Pe) 6 Pe which is the same as the one 
that we obtain from (4). 

In general the bf s are very difficult to obtain analytically and hence the analysis of 
codes designed for specific purposes is not veiy easy. The maximum value that bj can 
have can bg calculated as 

bjmax = N n Cjqj (9) 

where N is the number of codewords in the code. If the different bj’s for the code assume 
the corresponding maximum value or zero then the performance of the code would be the 
optimum. Examples for such codes are Hamming code, Golay code etc., which are perfect 
codes. 

3. Deterministic annealing for code design 

In this paper we use Deterministic Annealing (DA) (Rose et al 1990b), a stochastic relax¬ 
ation clustering algorithm, for code design. Attempts to use other clustering algorithms 
may be found in Joseph (1994). Here we make use of an effective cost function that depends 
on the control parameter p. This cost function is then deterministically optimized at each 
p, starting from a low p where P is increased gradually. The probability that a particular 
input vector x belongs to a cluster Cj is calculated by using the following equation. 

P(x € Cj) = e~^ EAj) j e ~ fiEx{k) (10) 

/ k=0 

where E x ( j ) is the cost incurred if vector x is associated with cluster j. The cost function 
generally used is the Euclidean squared error distance. 

3 . 1 Annealing schedule 

The performance of the deterministic annealing algorithm has been studied to begin with, 
in order to gain insight into its working. It is used for obtaining 16 codewords in binary 7 
dimensional space with squared error cost function and real arithmetic. While simulating 
the algorithm we used different numerical representations of the two binary symbols. 
Table 1 gives the results for the various cases. The table shows that as the numeric value 
increases the algorithm converges faster. Figure 2 shows the history of a single run of the 
deterministic annealing algorithm where we have plotted the number of iterations versus 
the p value. The graph shows regions of low activity and very high activity in terms of 
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Table 1. Performance of deterministic annealing 
with different numeric values for the binary symbols. 


Binary symbols 
represented as 

fi value above 
which the code Initial fi 

is optimum 

0,1 

4.510 

1,-1 

0.796 

2, -2 

1 

O 

a\ 

oo 

d 

3, -3 

0.0911 

4, -4 

0.042 


number of iterations, which indirectly describes the stage the algorithm passes through. A 
large increase in the number of iterations at a particular fi stage implies that the codewords 
are being split. Once this happens it takes some time for them to separate out and hence 
increased activity is observed immediately after the codewords split. The algorithm was 
terminated once fi becomes greater than 5. 

The /? schedule suggested by Rose et al (1990b) was that of 10% increment between 
successive stages. It turns out that if the algorithm were to converge, then the factor by 
which is increased should decrease with /?. A linear fi schedule has this property and it 
was observed that if we use such a schedule then the algorithm converges better. This may 
be intuitively explained as follows. To start with, the code consists of only a single distinct 
codeword. As fi increases this codeword splits into two or more codewords at appropriate 
values of and they separate out (Rose et al 1990a). As the annealing schedule used for 



Figure 2. Number of iterations versus fi for a single run of DA. 


fi is increasing in nature, successive splits of the codewords occur at higher and high* 
values. Hence if the fi schedule proposed by Rose and coworkers were followed, then 
difference between successive fi stages also would be more as fi increases. The philoso 
behind the deterministic annealing algorithm is to follow the global minimum from 
fi stage to the next assuming that the global minimum does not change much from 
fi stage to the next. If the increase in fi from one stage to the next is considerable, 1 
the above assumption may not hold and the algorithm may move away from the gl< 
optimum. This is the reason for the improved performance of the algorithm with lii 
schedule with small step sizes as the annealing schedule. 

Also we notice from figure 2 that there is a direct relationship between the nun 
of iterations at a particular fi stage and the activity of the algorithm at that stage. If 
codewords are being split then the number of iterations will be correspondingly high." 
increased number of iterations allows the codewords to separate out once they split. It i 
be noticed that the splitting and separating out of the codewords are dependent on fi. As 
aim has been to track the global minimum of the objective function, codewords shouh 
allowed to separate out immediately after the splitting, i.e., before fi changes considera 
This then intuitively suggests that in the neighbourhood of fi stages where the codew< 
split, increasing fi by smaller amounts would help the codewords to separate out v 
Hence, we made the fi schedule dependent on the number of iterations at a partic 
fi stage, i.e., we reduced the step size in the neighbourhood of fi where the codewr 
start splitting. It was observed that the above strategy improves the convergence of 
algorithm. 

3.2 Incorporating Hamming distance measure 

We propose below an algorithm that can do clustering with integer arithmetic and Hamir 
distance as the cost function. In the modified algorithm we replace the cluster ce 
criterion for codebook updating by a method based on probability. Here we calculate 
probability, P(x e Q) that an input vector x belongs to a particular cluster Q using 
Then with this probability we form an array M-j\ 0 < i < N — 1,0 < j < n — 1, i 
k < q — 1, which would then be used for codebook updating. Once we have construe 
the array Afy , we update each symbol in the different codewords by symbols having 
maximum probability. The algorithm is described below. 

1. Set fi — 0, fimax — °0. 

2. Choose an arbitrary set of initial cluster centres. 

3. Initialize =0 ,0 <i < N — 1,0 < j <n — 1,0 <k <q — 1 

4. Find the probability P(x e Q) that a training vector x belongs to a particular clu 
Ci using (2), with Hamming distance as the cost measure. 

5. Update the probability arrays M-® corresponding to the different symbols in 
training vector (as explained below). 

6. Update the codebook with symbols having the highest probability. 

7. If fi < fi m ax, perturb the codebook, increase fi, go to step 3. Else stop. 

Consider any particular codeword, say the ith one, in the codebook. We calculate 
probability that a training vector belongs to the particular cluster using (2) with Hamn 
distance as the cost measure. After calculating the probability, we give a weightage equi 
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this probability for the q different symbols in the input vector while updating the codebook. 
We keep a probability array Afj^ where k corresponds to one of the q different symbols, 
i denotes the codeword index, and j is the codeword dimension index. Hence has 
a size of q x N x n. After calculating the probability that a training vector belongs to 
a particular cluster, we update the probability array as follows. Taking q symbols to be 
0, 1 , — 1 , and n components of x to be xq, x \, • • •, x n -\ , we have 

M u )= E p (* eC i)- Oi) 

x:xj=k 

Once we have gone through the entire input sequence, we update the present codebook 
with symbols with the highest probability in the respective positions, 

y tJ = max M\f 0 < i < N - 1,0 < j < n - l. (12) 

In the case of a tie we resolve it randomly. We have tried different methods while updating 
the probability array. But none of them stood out as a clearcut winner. 

3.3 Simulation examples 

We used the algorithm for designing codes for some given M over binary 7-tuples and 
ternary 4-tuples. The motivation for designing codes for some given M comes from the 
following. As an example consider the case where the input set consists of 10 messages 
(i.e., M = 10) and single bit error correction is desirable. One possible method would 
be to use 10 codewords from binary [7,4] Hamming code for this purpose so that at the 
receiver we may correct any single error. Another alternative for this would be to design 
a code consisting of 10 codewords over binary 7-tuples and use it for the purpose. For 
values of M where optimum code does not exist/is not known, designing a code using 
the proposed algorithms might give an improved performance over the second alternative. 
The improved performance is at the cost of extra computation, since there may not be any 
easier way of decoding the code than minimum distance encoding at the receiver. 

Table 2 shows a code C\ consisting of 10 codewords designed using the proposed 
deterministic annealing algorithm for q = 2, n = 7. In table 3 we compare this code and 
the code C[ with 10 codewords from binary [7,4] Hamming code. It may be noticed that 


Table 2. Examples of codes de¬ 
signed for specific applications. 


Cl 

C 2 

C 3 

1111101 



1010000 

1111110 


0110010 

1100101 

0001 

1011110 

0000110 

0212 

1100100 

0110001 

1110 

0011001 

0011100 

2121 

0101110 

1010010 

2200 

0000111 

1011001 

1022 

0100001 

1001011 

1101011 
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Table 3. Comparison of codes C j, C{, C 2 , C ; 2 , C 3 and C 3 designed from corresponding 
Hamming code and using DA. 



Code with 

10 codewords 

Code with 

8 codewords 

Code with 

6 codewords 

bi 

From [7,4] 

Designed 

From [7,4] 

Designed 

From 

Designed 


Hamming 

using 

Hamming 

using 

ternary 

using 


code 

DA 

code 

DA 

[4,2] code 

DA 


C' 

Ci 

r ' 
c 2 

c 2 

r' 

C 3 

C3 

bo 

10 1 

10 

8 

8 

6 

6 

b\ 

70 

70 

56 

56 

48 

48 

b 2 

40 

48 

48 

63 

24 

27 

b 3 

8 

0 

16 

1 

3 

0 

Total 

dist. 

174 

166 

200 

185 

105 

102 


C 1 gives less total distortion in terms of Hamming distance, when used for encoding the 
binary 7-dimensional space when compared to Cj. Figure 3 gives the relative performance 
of the two codes in terms of probability of error assuming a binary symmetric channel. 
From the figure it can be seen that C\ performs better than Cj. This implies that we can 
have improved performance if we design application specific codes. 

Also shown in table 2 is a code C 2 consisting of 8 codewords designed using the proposed 
deterministic annealing algorithm over binary 7-dimensional space. In table 3 we compare 
this code with a code C r 2 consisting of 8 codewords from binary [7,4] Hamming code. 
Again, figure 3 compares the relative performance of the two codes in terms of probability 
of error assuming a binary symmetric channel. It can be seen that C 2 performs better than 
C 9 in terms of probability of error. 

Also given in table 2 is a code C 3 consisting of 6 codewords over ternary 4-dimensional 
space. In table 3 we compare this code with a code C 3 consisting of 6 codewords from 



Figure 3. Performance of application 
specific codes designed using DA. 
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ternary [4,2] Hamming code. Here also the code obtained from deterministic annealing 
algorithm gives less distortion when used for encoding the ternary 4-dimensional space. 
It implies that the code designed using the proposed algorithm performs better in terms of 
probability of error. 

4. Conclusion 

In this paper we have developed the concept of the relationship between clustering and 
code design. We have also seen the need for an algorithm that can do clustering over finite 
dimensional spaces with Hamming distance as the distance measure. We have chosen 
deterministic annealing, an efficient clustering algorithm with real arithmetic, for code 
design. We have proposed a modification to this algorithm which enables it to do clustering 
using Hamming distance as the distance measure. It was observed that the convergence of 
the algorithm can be improved by using a linear annealing schedule for /3 with decreasing 
step sizes at points where the codewords split. Some examples for codes designed for 
specific applications have also been produced. The performance of the codes with respect 
to probability of error has been compared with codes consisting of the corresponding 
number of codewords from the respective Hamming code. It was observed that the codes 
designed for specific applications, although having equal minimum distance, perform better 
than the corresponding ad-hoc codes obtained from Hamming code. 
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Abstract. Demodulation of Gaussian Minimum Shift Keying (GMSK) using 
a limiter-discriminator is a low complexity alternative to coherent demodula¬ 
tion. This so-called digital FM demodulation is followed by clock recovery, 
sampling, and thresholding. Conventionally, clock recovery is done in hard¬ 
ware, and matched filtering is usually not possible when the Gaussian pulse 
is wider than a bit duration. We propose a clock recovery technique based on 
discrete-time processing of the demodulated baseband signal. This technique 
couples very nicely with a new maximum likelihood sequence estimator for 
the data that uses a whitening filter followed by a Viterbi decoder. The entire 
detection algorithm can be implemented in an efficient manner on a Digital 
Signal Processor (DSP). Computer simulation results are presented to show 
that the new algorithm performs better than the conventional slicer by as much 
as 5.5 dB. 

Keywords. Viterbi decoding; digital FM receiver; GMSK demodulation. 


Introduction 

aussian Minimum Shift Keying (GMSK) is a popular bandwidth-efficient modulation 
heme employed in many mobile and personal communication systems (GSM 1991; 
rSI 1992). It permits the use of a variety of receivers, ranging from the coherent to the 
gital-FM type, each with a different complexity/performance trade-off. The transmitter 
in also be simplified depending on the type of receiver employed. A digital FM receiver, 
id a correspondingly simplified transmitter, is an extremely attractive low-complexity 
)tion. The digital FM modem has been known and used for many years now (Pawula 
)81) but its performance has been quite poor when compared to the best possible receiver 
at one can implement, namely, a coherent receiver. 

This paper describes an improved detection scheme for digital-FM reception of GMSK 
gnals. The improvement is made feasible by the availability of Digital Signal Proces- 
irs (DSPs). Since a DSP is almost always needed in modem communication systems for 
^forming many tasks, it can be used for implementing the modem algorithm also. The 
iproved algorithm is based entirely on digital signal processing techniques and requires 
lly a small increase in complexity. Data clock synchronisation is achieved computation- 
ly using a correlator and interpolator. An improved bit error rate (BER) performance 
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is obtained by replacing the slicer (based on simple thresholding) with a Viterbi detec¬ 
tor which implements Maximum Likelihood Sequence Estimation (MLSE). This Viterbi 
Algorithm (VA) operates directly on Nyquist-rate samples of the demodulated baseband 
signal, and does not require a separate matched filter. Finally, it is shown that if an approx¬ 
imate noise-whitening filter is used prior to the VA, the BER performance can be further 
improved. § 2 describes GMSK and its attributes, and in particular, the digital-FM demod¬ 
ulator for it. In § 3, we discuss sampling of the received baseband signal, and a DSP-based 
clock recovery technique. The MLSE detector for the GMSK baseband signal is described 
in § 4. An approximate whitening filter is also considered for the coloured noise at the 
demodulator output. The paper concludes with simulation results, some conclusions based 
on those results, and an outline of related future work. 

2. GMSK modulation and demodulation 

Wireless communication systems based on frequency division multiple access (FDMA) 
and/or time division multiple access (TDMA), require highly bandwidth-efficient modu¬ 
lation to minimise adjacent channel interference. Minimum Shift Keying (MSK) (Proakis 
1989, ch. 3) is a nonlinear modulation scheme which provides significantly superior out- 
of-band power rejection when compared to linear modulation techniques with similar 
information (bit) rates. Further reduction of out-of-band power is possible by base-band 
pulse shaping. High bit-rate systems, such as those based on the GSM (GSM 1991) and 
DECT (ETSI 1992) standards, employ a Gaussian prefilter to meet the stringent specifi¬ 
cations on adjacent channel interference. In this Gaussian MSK (Hirade & Murota 1981) 
system, a trade-off between out-of-band power and intersymbol interference (ISI) is pos¬ 
sible by varying the BT product, where B is the 3-dB bandwidth of the Gaussian prefilter 
and T is the symbol (bit) period. 

The GMSK modulator is illustrated in figure 1, where the transmitted bandpass signal 
s(t) is given by 

s(t) = A c cos(2nf c t +<p(t) +<Po)- - (1) 

Here A is the carrier amplitude, f c the carrier frequency, 4>o an arbitrary initial phase, and 
4>(t) a time-varying phase given by 

= m(X) dA. (2) 

l J—00 

The signal s(t) is thus an FM signal with message signal m{t), where in turn m(t) is given 
by 

m(t) = Y,I k g(t-lcT). (3) 

k 

The instantaneous frequency deviation of s (t ) is proportional to m(t ). In (3), {I k } is a bipolar 
(random) information sequence and g(t) represents the convolution of a rectangular pulse 
of duration T seconds with the Gaussian prefilter whose impulse response is of the form 
P exp (—at 2 ). (Note that if the Gaussian filter is absent, we obtain MSK.) Although this 
implies that g(t) is infinite in duration, git) can be truncated to a finite duration in practice, 
and an appropriate delay introduced to make the filter causal. 

For example, when BT = 0.5, g(t) can be taken to be zero for t £ [0, T g ] where 
T g =37’. Therefore, an ISI contribution from just one preceding and one succeeding 



J 


1 KJ 


NRZ 

Gaussian 

m(t) 

VCO 

s(t) 

data 

Filter 





Figure 1. GMSK modulator 

ymbol distorts the transmitted pulse for the present symbol of m(t). A sample waveform 
or this example is shown in figure 2, where g(t) is for all practical purposes less than 3 T 
n duration. Moreover, even when g{t) is truncated to a duration of 2 T (i.e., T g = 2 T), 
he maximum error by this approximation occurs at the peaks and valleys caused by the 
HO and 101 bit patterns respectively, and is about 6%. 

Note also that it is not necessary to implement the GMSK modulator as shown in 
igure 1. It is generally not easy to implement a linear voltage controlled oscillator (VCO) 
v^ith precise control of its sensitivity and centre-frequency. The modulated output s(t) 
an be written in I-Q form using Rice’s representation (Proakis 1993). The modulator 
an be thus implemented using a conventional balanced modulator with the in-phase and 
uadrature components as the baseband signals. 

Both coherent and non-coherent demodulation of the GMSK signal are possible. A 
uadrature (I-Q) demodulation approach becomes obvious when we rewrite (1) as follows 
assuming <po = 0 for convenience): 

s(t) = Aco$(j)(t)cos2jrf c t — Asin0(Osin27r/ c r (4) 

/here Aco$>cp(t) and A sin (f)(t) are the in-phase and quadrature-phase signals, respec- 
vely. If the nominal variations in the carrier frequency (during modulation) are kept 
mall, a coherent quadrature demodulator is possible as in conventional QPSK systems, 
lowever, when the instantaneous frequency is allowed to deviate significantly (for ex- 
mple, the DECT standard (ETSI 1992) allows as much as 30% deviation from the ideal 
alue), even differential detection of the I-Q data streams may not be possible. Such large 
rrors in the instantaneous frequency are permitted in some standards in order to enable 
le use of a VCO-based GMSK modulator as shown in figure 1, which may not be able to 
laintain high accuracy. Such implementations can be very cost-effective, as for example 
direct modulation transmitter, where the RF carrier is directly frequency-modulated. In 
ict, when the instantaneous phase is not maintained precisely according to (2), it is more 
ppropriate to label the modulated signal as Gaussian Frequency-Shift Keying (GFSK) 
tther than GMSK. 

A low-complexity alternative to the quadrature demodulator for GMSK is the simple 
M limiter-discriminator followed by a sampler and detection device. Such a receiver is 










Figure 3. Digital FM demodulator 


referred to as a digital FM detector (Pawula 1981) and it goes hand-in-hand with a VCO- 
based modulator as shown in figure 1. This digital FM receiver is illustrated in figure 3, 
where x(t) is the noisy baseband signal. Note that, in the absence of noise, the limiter- 
discriminator output is the message signal m(t) given in (3). The lowpass filter at the 
discriminator output must have a bandwidth greater than the bandwidth of the Gaussian 
filter so that m(t) is not distorted. The'remaining baseband processing tasks are: (a) clock 
recovery, and (b) data detection. In figure 3, the simple threshold detector (slicer) can be 
replaced by a matched filter or a Viterbi detector which implements Maximum Likelihood 
Sequence Estimation in order to improve the performance. 

It is also known (Haykin 1983) that if the SNR at the discriminator input is above a 
certain threshold, the output of the discriminator can be taken as m(t) with additive noise 
n(t). The output noise spectral density S n ( f) is of the form (Haykin 1983, ch.6) 


Sn(f) = 


N 0 / 2 , -W < f <W, 
0 , otherwise. 


(5) 


where 2 IT is the bandwidth of the bandpass filter at the limiter input and Nq/2 is the 
spectral height of the AWGN at the demodulator input. Such a parabolic noise spectral 
density implies that the noise is correlated from one symbol-duration to the next. 


3. DSP-based digital FM detector 

Detection of the data from the baseband signal, jc(r), is accomplished by (a) recovering the 
bit clock, and (b) sampling and thresholding (slicing). The sampling/thresholding step may 
be preceded by matched-filtering. Even though the noise is not white at the matched-filter 
input, such filtering typically improves the performance of the detector. Matched-filtering is 
easier to implement when the baseband pulse is designed so as to prevent ISI at the matched 
filter output, e.g., a rectangular pulse of duration T. If the pulse is not so designed, as is 
the case with GMSK, matched filtering cannot be performed on a bit-by-bit basis. Rather, 
we need a match to the entire received sequence. This is accomplished by the so-called 
maximum likelihood sequence estimator which can be implemented efficiently using the 
Viterbi algorithm. MLSE-based data detection is discussed in § 4. 

In the present section, we focus our attention on the tasks of clock recovery and data 
detection, which are performed without using any filtering operation. Generally, such an 
approach lends itself very nicely to an all-hardware implementation. However, we propose a 
clock-recovery and data-detection algorithm based fully on discrete-time signal processing. 
While this DSP implementation is comparable in its complexity and performance to a 
hardware implementation, it becomes a more favourable approach when coupled with the 
more sophisticated MLSE-based detector. 
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3.1 Clock recovery 

In any modem, clock recovery is performed for two reasons: (a) to play out the detected data 
at exactly the same rate as is being transmitted, and (b) to assist the modem in performing 
as well as possible for any given performance criterion. 

Let us consider specifically a digital FM receiver based on sampling and thresholding 
the baseband signal x(t). We desire here to recover a clock signal such that when x(t) is 
sampled and thresholded with respect to this clock, the bit-error probability, P e , is as small 
as possible. When,there is ISI in x(t), P e for a given bit may vary depending on the values 
of the bits occuring before and after it. For example, in GMSK with BT =0.5 or less, 
there is a significant amount of ISI. In such a case, one may wish to minimise either an 
average bit-error probability, P ea , or the worst-case bit-error probability, P ew , for tile bit 
whose error probability is highest due to ISI. In either case, the recovered clock must be 
such that either P ea or P ew is minimised, as desired. In this paper, we choose P ew as the 
performance measure. 

Let us assume, as is usual in most clock recovery schemes, that the clock frequency is' 
known precisely, and only its phase is unknown. Frequency error, if any, is assumed to be 
so small that it manifests only as a slowly-changing clock phase, which can be tracked. 
Let x a (nT ) be the samples of x(t) sampled according at t = nT + a. The nth bit is thus 
decided by the sign of x a (nT). 

We desire to choose a such that P ew is minimised. Since the noise variance is independent 
of time, P ew is minimised when a is chosen so as to maximise minfxa (nT)}. That is, the 
sampling clock is chosen such that the bit with the maximum ISI (and hence worst P e for a 
given noise variance) has the maximum amplitude at the sampling instant. For example, the 
optimum sampling instants for a GMSK signal with BT = 0.5 are shown by cross-marks 
in figure 2. 

3.2 Tracking of clock phase 

The design of many modems include a training-sequence or preamble for enabling the 
receiver, among other things, to synchronise itself with the transmit clock phase. The 
maximum-likelihood estimator of clock phase for a known signal in white noise is the 
correlator (Ziemer & Peterson 1985, ch. 6). The correlator, whose output is maximum 
when the input is aligned with its own stored copy of the known signal, is often employed 
even when the noise is not white. In order to obtain the optimal clock-phase estimate that 
minimises ew> the preamble must be such that it has a correlation peak at the optimal 
time instant. This is achieved by using a preamble with an appropriately chosen periodic 
data pattern. An example is shown in figure 4 for a GMSK signal with BT = 0.5. An 
alternating one-zero data pattern ensures that the correlator output will peak at the instants 
when x(t) reaches its largest value in a bit-duration where the ISI is greatest. 

In continuously-operating modems, after an initial training-period, the receiver tracks 
the slowly-varying clock phase. This tracking is performed using any of a number of well- 
known techniques, such as the early-late method (Ziemer & Peterson 1985, ch. 6). For the 
digital FM receiver, the tracking-loop error signal must be updated only for those bits with 
maximum ISI. In modems for bursty transmission, as in TDMA systems, usually every 
burst includes a preamble for estimating the clock phase. Further, since the clock phase 
will not vary significantly during the burst, no tracking is needed while detecting data in 
that burst. 



1— i —3—*— t — t — 1 —8—3—To 


Figure 4. Alternating one-zero pat¬ 
tern for clock recovery 
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Phase-locked/tracking loops are conventional hardware implementations of the clock 
recovery techniques discussed above. We consider next a completely algorithmic DSP- 
based implementation but, prior to that, we discuss the sampling of x{t) for discrete-time 
processing? 


3.3 Sampling of the baseband signal 

In most digital modulation schemes, the bandwidth of the baseband signal is usually some 
value in the range [R/ 2, R] where R is the symbol rate. In selecting the sampling rate, 
/ v , it is convenient to choose a multiple of the symbol rate that also satisfies the Nyquist 
criterion. Typically, a sampling rate of 2 T is sufficient when the bandwidth of x(t) is 
as mentioned above. Most detection algorithms are executed once every symbol, and a 
sampling rate which is a multiple of R simplifies the algorithm considerably. 

In the case of GMSK, a sampling rate of 2 R is suitable for all values of the Gaussian 
filter bandwidth B , even if it is somewhat liberal when BT is small. It is assumed, of 
course, that the noise is also bandlimited to a bandwidth less than f s /2. This is usually 
accomplished at the intermediate frequency (IF) stage, where the IF filter that provides the 
selectivity also bandlimits the noise. 

3.4 Clock recovery in discrete-time 

Let x{t) be sampled at rate f s = \/T = 2 R = 2/ 7\ and let the samples be denoted x(nT s ). 
Let T r — T s /M be the time resolution with which we perform clock-phase estimation. 
Here, M can be chosen as large as required; typically, values in the range 5 ~ 10 suffice. 

Clock-phase estimation amounts to determining which of the M instants T( n j) = nT s + 
iT r ,i = 0, 1, 2,..., M — 1, is the best sampling instant. In a burst modem, the optimum 
instant T( n j) is determined at the beginning, and j usually does not change during the 
burst (unless M is large). In a continuously-operating modem, the clock is tracked and the 
value of j in T^ n ,j) changes slowly with n. 

Since we sample at a rate greater than the Nyquist rate, correlation of the preamble can 
be performed in discrete-time. The correlator output is interpolated at the M — 1 instants 
T( n j) = nT s +iT r ,i = 1,2,..., M — 1 in between each output sample. For doing this task, 
M — 1 interpolators are needed. The maximum of the interpolated samples, occurring say, 
at T( n j) 9 gives the value j corresponding to the best sampling clock (for a time-resolution 
of T r . The yth interpolator is then employed to interpolate x(t) at t = nT s + j 7>, and the 
sign of the interpolated value indicates the data bit. 
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Thus, clock recovery in discrete-time is tantamount to determining which of M interpo¬ 
lators to employ , where M determines the resolution of our estimate. If the clock phase is 
being tracked, the value of j will change slowly, increasing or decreasing by unity every 
once in a while. When it reaches 0 (or M), a sample of the input x(nT s ) is repeated (or 
dropped) in order to adjust the sampling rate to precisely 2R, where R is the bit rate. The 
interpolator may be designed using any conventional approach (Vaidyanathan 1993). 

4. Digital FM Detection using the VA 

When digital FM detection is based on sampling once per bit and slicing, we are not 
exploiting the baseband signal over the entire bit duration. Although the noise is not white, 
and matched filtering is therefore not optimal, such filtering would nevertheless lead to 
some improvement in performance. However, matched filtering cannot be performed on 
a bit-by-bit basis when ISI is present at the matched filter input. In this case, an entire 
sequence of bits has to be matched to, i.e., we have to perform MLSE. 

If the noise is white, the optimal MLSE of an N -bit data sequence is defined as fol¬ 
lows (Proakis 1989, ch. 6). Let r(t) be the received signal andx^(r), i = 0, 1,..., 2 N — 1 
be the transmitted signal for the zth data sequence (of the 2 N possible A-bit sequences). 
We decide in favour of the jth sequence if the Euclidean distance Dj < D;, V/ ^ y, where 

f NT 

Di = I [r(0-* (,) (0] 2 d; (6) 

Jo 

This distance can be computed equivalently from the sampled sequences r(nT s ) and 
x U) (nT s ) as follows: 

2N-1 

Di= £ [r(nT s )-x«\nT s )] 2 dt (7) 

n = 0 

where we have assumed that there are two samples per bit. Note also that Di can be 
obtained from any set of uniformly spaced samples r a (nT s ) = r(nT s +a) andx^ \nT s ) = 
x^ l \nT s + a) for any a e [0, T 5 ], i.e., 

2N~\ 

Di= £ [r a (nT s )-x^(nT s )] 2 

n = 0 


N -1 1 

= £ X>«((2* + l)T s ) - 4°((2 k + l)T s )] 2 . (8) 

<fc = 0/ = 0 

From the above we see that D, can be accumulated on a bit-by-bit basis, the inner sum 
(over l ) being the metric for the /:th bit. Thus, determination of the minimum distance, 
Dj, can be performed using the Viterbi Algorithm (Proakis 1989). The sampled baseband 
signal x^HnTs) is given by 
N -1 

X (i) (nT s ) = £ 4 (l) g(«r 5 +a-kT ) (9) 

k = 0 

where {/^}£o is the A-bit binary expansion of ith sequence. Now, if g(t) is non-zero in 
[0, 4 ],'(and dropping the index i for convenience), the even and odd samples in the mth 
bit-duration are given by 



Time: 


k 





k—m - L 4* 1 


and 

m 

x a ((2m + \)T S ) = £ I k g(2(m - k)T s + T s + a) (11) 

k—m — L +1 

respectively, where L = ceil (T g /T) specifies the number of symbols contributing to the 
signal amplitude at the sampling instant. Here, the function ceil(.) rounds toward the upper 
integer. Depending on the value of a, the symbol I m -L+ i ma y or may not be included 
in one of the sums in (10) and (11). Thus, the number of states in the trellis is 2 L_1 . A 
state is labelled by the L — 1 bits ..., l m ~\. Two paths radiate from each state 

depending on the value of I m , and each state is also entered by two paths. For example a 
trellis diagram for L = 2 is shown in figure 5. 

There are two equivalent options for implementing the VA. In both cases, timing recovery 
is performed separately as discussed in § 3. In the first option, the trellis is populated with 
samples according to (10) and (11) for a particular value of a ( say a = 0. Based on the 
estimate of a obtained from timing recovery, an appropriate interpolator is employed to 
interpolate the samples of r(t) at t = nT s . These interpolated samples are then employed 
in the VA. Note that both samples in each bit interval are to be interpolated. In the second 
option, x(2mT s ) and x((2m + 1)7)) are pre-computed for M (where M = T s /T r , see 
§3.4), different values of a in [0, 7)]. Depending on the estimate of a, the appropriate 
set of samples are used to populate the VA for the duration of time during which the 
estimate of a remains unchanged. The second option will save on computation as run-time 
interpolation of r(nT s ) is avoided. 

4.1 Whitening filter for improved performance 

The MLSE is optimal only for signals in white Gaussian noise. If the noise is Gaussian but 
coloured, and the spectral density is strictly positive within the band of interest, then the 
optimal detector can be implemented as a cascade of a whitening filter and MLSE (Wozen- 
craft& Jacobs 1965, ch. 7). The pulse shape g(t) is replaced by g w (t) = g(t)*h(t), where 
h(t) is the impulse response of the whitening filter and * represents the convolution oper¬ 
ation. Since the duration of g w (t) will, in general, be longer than that of g(t), the number 
of states in the trellis may increase. The input r(nT s ) is convolved with h(nT s ), and the 
output is fed to the VA. The trellis is populated with samples of x(t) as in (10) and (11), 
with g(.) replaced by g w (.), and L now being determined by the duration of g w (-)- Either 
option may be exercised in implementing the VA, as discussed above. 
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In the case of digital FM, the noise spectral density is parabolic with a null at / = 0. 
\s such, this noise cannot be whitened exactly. Approximate whitening filters, which are 
;ssentially lowpass filters with a response proportional to l/f as / increase from zero, 
;an be employed. An FIR filter of low order is preferable so that the number of states in 
he trellis does not increase significantly. 


5. Simulation results and conclusions 


n this section, we compare the simulated BER performance of the digital FM demodulator 
or three detectors: sheer, sequence estimator without whitening filter, and MLSE with an 
ipproximate whitening filter. GMSK with BT = 0.5 was chosen for the simulations. The 
mpulse response of the Gaussian shaper, g(t), was approximated to be time-limited to 
0, 2 T] while generating the received samples. This implies that, even when perfect timing 
ecovery leading to sampling (or interpolation) at the optimal instants is assumed, there still 
:xists a 3% ISI contribution from adjacent symbols 4-1 and 4+1 on the even-numbered 
;ample corresponding to symbol 4- Note that the ISI contribution on the odd-numbered 
;ample of 4 from 4-1 equals about 50% of the signal amplitude. 

In constructing the VA, we neglect the small ISI contribution to the even-numbered 
.ample from symbol 4+1. in order to retain a 2-state trellis. It should also be mentioned here 
hat when there is no whitening filter, a 2-state trellis can always be ensured (independent 



Figure 7. BERs for different data 
sequences in white noise 
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of the sampling instances) by correctly grouping the samples corresponding to the same 
bit interval. This is clear from (10) and (11). Either of the two options described in § 4 can 
be exercised in implementing the VA, where the option without run-time interpolation is 
more computationally efficient. 

When a whitening filter is present, we know from (10) and (11) that the number of 
states in the trellis is determined by the duration of gw(t), the output of the whitening 
filter. Further, one of the samples in each bit duration may depend on only L — 2 past 
bits, depending on the value of a. The number of states in the trellis can differ by a factor 
of two, depending on a and also on whether the whitening filter order is odd or even. 
Thus, by employing interpolated sample values, it becomes possible to use a higher-order 
whitening filter for the same VA complexity. In other words, run-time interpolation helps 
in obtaining a superior BER performance for the same VA complexity, when compared to 
using the samples directly with a lower-order whitening filter. Therefore, a trade-off exists 
between the computational complexity of the VA and that of the run-time interpolator, 
when a whitening filter is introduced before the VA. 

Figure 6 compares the BER of the sheer with that of a 2-state VA for AWGN. The 
AWGN case is reported because the result can be verified with theory. The theoretical 
BER curve for antipodal signalling, namely (l/2)erfc {Et,/N q) 1 > 2 , where erfc(.) is the 
complementary error function, is also shown. For the 2-state VA, at high SNRs, the BER 
is dominated by the single-bit error event, resulting in a Euclidean distance equal to 6 
approximately (where A m is the maximum amplitude during the bit). When compared to 
the sheer, which has a Euclidean distance of 4.4^ , the VA thus has a 1.8 dB advantage in 
SNR. Indeed, this 1.8 dB gain is evident in figure 6 when the SNR is large. 

Since the instantaneous bandwidth of the GMSK signal is a function of the actual bit 
sequence, we are interested in studying the performance of the sequence which contributes 
to the maximum deviation (the alternating one-zero pattern) and the one which contributes 
to the least (the all-one or all-zero pattern). The BER performance of these particular 
sequences are compared to that of the random (message) sequence in figure 7, for VA in 
white noise. As expected, the performance of the random data sequence lies in between 
the performance for sequences which use the maximum and minimum channel bandwidth. 

Figure 8 compares the BER for these three different data sequences in coloured Gaussian 
noise with a parabolic spectral density as in (5) for —0.75 /T < f < 0.75/ T and with a 
cosine roll-off till ± W (to approximate the roll-off due to the pre-detection low-pass filter). 
The total noise power was kept the same as in the white-noise case. The performance of 
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Figure 9. BER performance of 
the VA with whitening filter 

the sheer is independent of the data sequence and noise spectral density, and is nearly 
identical to the result shown in figure 6. With the VA, we find that the BER improves in all 
cases when the noise is coloured. However, since the same noise power is distributed in a 
parabolic fashion, the low-frequency, all-one sequence experiences a greater reduction in 
BER with respect to the random sequence, when compared to the results for white noise. 

Finally, figure 9 compares the performance of the VA when a 4-tap or 6-tap FIR filter is 
introduced to approximately whiten the coloured noise. The magnitude response of these 
filters were chosen based on a numerical search technique so that the noise PSD at the 
output is approximately flat over the bandwidth —0.75 /T < f < 0.75/ T. While the 
resulting 4-tap whitener leads to a 4-state VA, the 6-tap whitener increases the number 
of VA states to eight. In both cases, the input samples were interpolated based on the 
recovered clock in order to minimise the number of states. The BER curves of the sheer 
and the 2-state VA for the coloured-noise case are also shown in figure 9. 

Observe that even with a 4-tap whitener, about 5.5 dB gain in SNR (at BER = 10~ 4 ) is 
possible over the sheer. For an 8-state VA corresponding to a 6-tap whitener, an additional 
0.8dB gain is accrued. Increasing the whitening-filter order (and the VA complexity) further 
will yield only marginal SNR improvement. 

The assumption of an additive-noise model with parabolic spectral density is valid only 
when the SNR at the demodulator input is above the threshold. This threshold varies be¬ 
tween 10 and 14dB (approx.) depending on the type of demodulator (I-Q or discriminator). 
At SNR levels near threshold or below, an additional “click-noise” component must be 
included in the model (Mazo & Salz 1966). Work remains to be done for determining the 
performance of the VA-based detector when click-noise is also present. 

Another interesting extension of this work is for the equalisation of ISI channels. Mul¬ 
tipath propagation leading to fading and ISI is a common occurrence in mobile/personal 
communication systems. A model is needed for the FM-demodulated baseband signal 
when the received signal has multipath interference. Based on this model, the VA-based 
detector can be modified to include the effect of distortion of the baseband pulse. The 
authors are currently pursuing work along these directions. 
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Novel techniques in high speed modem implementation 
using programmable DSPs 
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Hyderabad, India 500 482 

Abstract. We present the design and performance of a high speed MSK 
modem that incorporates digital heterodyne processing. A novel table look-up 
procedure for direct IF/RF modulation, based on a suitable maximal-length 
shift register sequence, is described. High performance, linear phase, low- 
complexity mixing and filtering in the receiver is achieved using multirate IFIR 
filters, where the shaping filter may possibly be M th-band. The configuration 
of the IFIR filters is chosen based on the carrier frequency and the filtering 
requirements. 

Keywords. Minimum shift keying; direct IF/RF modulation; multirate IFIR 
filters. 


1. Introduction 

There is an ever increasing demand for low cost, high speed data communication systems 
in a variety of applications such as VSATs, PCS, radio paging, utility metering etc. In this 
paper, we present design and implementation details of a Minimum Shift Keying (MSK) 
modem using a novel transmitter and receiver structure. The features and architecture 
of modem low-cost DSP chips are exploited by the techniques presented herein. Our 
algorithms implemented on a modem low-cost 33 MIPS processor allow data rates upto 
384 kbps and carriers of upto 1 MHz. 

In recent years MSK (Pasupathy 1979; Austin et al 1983) and its variants (in general. 
Continuous Phase Frequency Shift Keying - CPFSK) have become increasingly popular 
modulation techniques for signaling through bandwidth and amplitude limited channels 
in which data must be efficiently packed into a restricted bandwidth. In the case of MSK, 
99% of the signal power is contained in a bandwidth of about 1.2R (where R is the bit rate) 
as compared to 8.0R for QPSK and OQPSK. It is also easier to filter an SK signal because 
of relatively smaller side lobes. The advantage of MSK over QPSK is its phase continuity. 
As a result, the inter-symbol interference caused by nonlinear amplifiers or hard limiting 
in satellite applications may be avoided. Further, MSK allows use of efficient digital IFIR 
filtering, which in turn reduces the complexity of analog filters used in RF sections. 

This paper presents a novel technique for direct IF/RF MSK modulation using a look-up 
table generated from a maximal length shift register. It is shown that the complexity of the 
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Figure 1. Block diagram of MSK modem. 

modulator decreases with increase in the number of bits taken for modulation at a time, N, 
at the expense of memory. Choosing N as even reduces the complexity of the modulator 
as the start phase has only two states for every N bit input. This look-up table scheme 1 can 
further be generalized to GMSK modulation (Jayasimha & Harinath Reddy 1995) used in 
GSM and DECT. Multirate IFIR filtering techniques (Neuvo et al 1984) are exploited for 
low complexity heterodyne processing in the receiver. 

Section 2 of this paper provides definition of terms and describes the modem archi¬ 
tecture. Section 3 provides implementation details of the IF/RF MSK modulator using 
look-up tables, heterodyne processing using multirate IFIR filters, digital matched filter 
and synchronization processing. Section 4 provides performance results of the modem in 
the presence of additive Gaussian distributed random noise. 

2. Modem configuration 

A block diagram of the modem is as shown in figure 1. In MSK, the in-phase (I) and 
quadrature-phase (Q) data streams are skewed by T /2 duration in the modulator and are 
half-cycle sinusoidally weighted (Austin et al 1983). 

The mathematical expression for the output of the MSK transmitter is (Pasupathy 1979) 

S(t) = [a;(t) cos(nt/2T) cos(2tv f c t) + ag(f) sin(nt/2T) sin(2jr/ c f)] (1) 

where T is the bit duration and f c is the carrier frequency. 

The above equation can also be written as 

S(t ) = cos[2 nf c t + b k (t)(nt/2T) + 4>kl ( 2 ) 

Here bk is +1, if aj and ag have opposite signs and is -1, if aj and ciq have the same 
sign. <j)k is 0 or jr corresponding to a/ = 1 or — 1 . Note that bk(t) can also be written as 
-a Q (t)]. 

The signaling frequencies in MSK from (2) are fh = ( f c + fbl 4) and fi ='(/<? — fb/4)- 
Hence the frequency deviation is equal to half the bit rate, i.e., (h — fi) = fb/2■ This is th e 
minimum frequency spacing that allows the two FSK signals to be coherently orthogonal, 
hence the name “Minimum Shift Keying”. 


1 The design of a look-up table using maximal length shift register was suggested by S Jayasimha 
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The carrier phase is given by 9(t) = [bk(t)(nt/2T)] = ±(nt/2T). This increases or 
decreases linearly during each bit period of T seconds. A bit bk of +1 corresponds to an 
increase of the carrier phase by 90° and corresponds to a higher frequency //,. Similarly 
bk = — 1 implies a linear decrease of phase by 90° over T seconds, corresponding to 
the lower frequency /;. In order to make phase continuous at bit transitions, the carrier 
frequency f c should be chosen such that f c is an integral multiple of /*/4, fb being the 
bit rate and f c the carrier frequency. 

An optimum receiver should use a matched filter which delivers sufficient statistics for 
the decision device. The matched filter receiver consists of correlating the received MSK 
signal with the recovered carrier and a 'sinusoid at half the symbol rate followed by an 
integration operation. 

3. Digital implementation of high speed modem 

The implementation of a high speed modem on a single DSP depends on the use of memory 
and computationally efficient algorithms for modulation and demodulation. 

In general, MSK modulation at the desired carrier can be implemented on a DSP in two 
ways: 

1. The use of bit-by-bit algorithm in which individual input bits change the state of the 
modulated waveform between higher and lower frequency fh and //, preserving the phase 
continuity across bauds. 

2. The use of a table look-up scheme to modulate multiple bits at once. 

However, a brute force look-up table modulator would use a large amount of memory, which 
works out to[N-a-2 N+l ] locations of memory, where N is the number of bits taken for 
modulation at a time and is the number of samples per bit. An innovative approach, which 
minimizes the look-up table memory required is described in the following subsection. 

3.1 High speed MSK modulator 

The modulator employs two look-up tables - Wave table and Index table. The Wave table 
stores coded MSK waveforms with a suitable IF carrier in the ordering obtained from a 
linear shift register sequence. Note that the wave table generated here is applicable only 
for carrier frequency which is an integral multiple of fb/4. 

Consider a case of modulating N input bits at a time. Here, N is chosen as an even 
number to ensure that the modulated waveform corresponding to the N bits end with a 
phase of either 0 or it. The output sequence from the shift register contains a start phase 
bit for the next N input bits along with 2 new bits. It should be noted that the complexity 
of modulator decreases with increase in N. 

For example, let N = 4. Given P„(x ) is a polynomial of degree 4, the following 
recurrence relation is used to generate a sequence of polynomials: 

P n+] (X) = [x 2 • P n (x) + X 4 + X 7 • P n (X)], (3) 

where 



x 6 = x 4 + x, 

. X^ = x 4 4- X + 1, . . ; 

xf;;^=Q .jexcept for k = 0 , 6 ,9< 



The coefficient x k of the polynomial denotes the starting phase: 0 corresponds to a starting 
phase of 0 and 1 to a starting phase of n. The remaining bits correspond to the frequencies 
of the waveform, for k bit durations, where a 0 denotes the low frequency, /;, and 1 the 
high frequency //. Note that this particular coding corresponds to an implicit differential 
encoding at the modulator. 

The linear recurrence in (3) produces all possible N-hit sequences except one, the iden¬ 
tity element of the recurrence. In constructing the wave table, the waveform corresponding 
to the recurrence identity is stored first followed by a waveform that is a continuation of 
the identity. Thereafter, the recurrence is used to generate the remaining (2 N+X — 2) wave¬ 
forms. 

The total size of the table is [2 • a ■ (2 N+l + 1)] as compared to the brute force look-up 
table of [AT-a- 2 N+X ] where a is the number of samples per bit. Other sequences (derived 
from recurrances or a tree-search procedure) can be used to generate wave tables for other 
CPM modulation schemes (Jayasimha & Harinath Reddy 1995). 

The Index table comprises 2 iV+l start indices to the wave table of the MSK modulated 
waveform corresponding to the N bit input. It also provides the start phase information 
for the next N bit input sequence. The table is of length 2 N+X because the previous phase 
must prefix the N bit information in order to preserve the phase continuity across bands. 

The modulator takes N bits at a time from the incoming bit stream and forms an index 
with the current phase. This is used to obtain the wave table offset and start phase informa¬ 
tion for the next iV input bits from the Index table. The wave table offset gives the starting 
address of the MSK modulated waveform corresponding to the N bit input. Starting from 
this offset in the Wave table, N times the number of samples are transmitted. 

3.2 High speed MSK demodulator 

The demodulator comprises IFIR digital mixing, I/Q separation, matched filtering and 
synchronization processing. 

The received MSK waveform is first separated into in-phase (I) and quadrature-phase (Q) 
channels using multirate IFIR filtering techniques (Jayasimha & Harinath Reddy 1995). 
Matched filtering is then performed on these base-band signals. The matched filter outputs 
are then rotated by —Jt/2 to account for the linear phase rotation of the MSK waveform. 
The rotated matched filter outputs are integrated over a baud to estimate the phase. Upon 
estimating the phase, a test is performed to check whether the signal is present or not. If the 
signal is present then coherent detection and differential decoding is performed. The bits 
obtained are then packed into the required format. The following subsections deal with 
these blocks individually. 

3.2a IFIR digital mixer: The digital mixer uses interpolated FIR (IFIR) filters first 
described by Neuvo et al (1984). The complexity of IFIR filtering can be further reduced 
by using an equiripple Mth -band filter for the shaping filter. This filter was designed using 
the method described by Jayasimha & Narasimha Rao (1995). This design produces a 
low complexity IF/RF digital filter, thereby reducing the order of analog filters used in 
RF sections. The configuration IFIR filters is chosen based on the carrier frequency and 
the filtering requirements. In full duplex systems, the transmit and receive frequencies are 
chosen as /s/4 and /s/8 or vice versa. 

The computational advantage of IFIR mixing, downsampling and matched filtering (at 
1.5 complex samples per bit) as opposed to IF matched filtering (24 real samples per bit) 
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Table 1. Comparison of MIPS required at 288 kbps. 


Function 

No heterodyne 
processing (MIPS) 

With heterodyne 
processing (MIPS) 

Digital mixing 

nil 

31 

Matched filter 

84.6 

6.33 

Sync, processing, 

7.2 

7.2 

Phase estimation & 



Coherent detection 



Total 

91.80 

44.5 


s clear from table 1, which shows a comparison of the required MBPS at 288 kbps (with 
hree timing hypotheses). 

i.2b Digital matched filter: A filter whose impulse response is a time-reversed and 
lelayed version of some signal is said to be matched to that signal. The purpose of the 
natched filter is to increase the signal component and decrease the noise component at the 
ame time. This is equivalent to maximizing the signal to noise ratio at the output at some 
nstant. The term base-band is used to designate the band of frequencies representing the 
triginal signal as delivered by a source of information. 

The digital matched filters operating on the base-band signal are: 
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vhere M = 2 (bit duration/sample duration) and o = timing offset. 

These are finite sum approximations to the exact matched filters. Notice that the range 
)f integration of X k overlaps that of , and similarly for Yk and Yk+\ . This means that 

he matched filtering must be accomplished by four matched filters or correlators, one pair 
or even values of k and the other pair for odd values of k. 

In order to address the issue of synchronization, a timing offset a has been introduced. 

3it rotation: Because matched filter outputs for successive bits are rotated by —it 12, we 
iefine 


^4m+/i(° r ) "b j^4m+a (°0 — [A4m+M (p ) + 7^401+/* (*?)] ' e^ 71 ^ (5) 

where p = {0,1,2,3}. 

This is equivalent to the following re-ordering of filter outputs: 
k 1 2 3 4 5 

Rk Xi —Y 2 -x 3 *4 *5 



S k Yx X 2 -T 3 -x 4 Y 5 




3.2c Synchronization processing: This consists of phase estimation at each timing c 
set, selection of timing offset and coherent demodulation using the phase estimated at i 
selected timing offset. 

3.2.c (i) Phase estimation - Phase estimation is done by integrating the rotated match 
filter outputs over a baud using the following equations. 

Rsqier) = 

k 

Ssq(&) = y '. S 2 | 

k 

RS(cr) = Y, R k(°) ■ Sk(a). 
k 

Carrier phases 9 (cr) for each of the three timing hypotheses are estimated using: 

9(a) = 0.5 • tan' 1 [N(a)/D(a)] ( 

where 


N(a) = [2 • /?5(cr)] and ( 

D(o) = [R sq (a) - S sq (a)]. ( 


The input to the matched filter with <p as block phase can be given as 

I(t) = A ■ cos(nt/2T + <p)mdQ(t) = A • sin(jrf/2r + <j>). 

In matched filtering, the incoming signal is multiplied by s,m(nt/2T) and integrated ov 
a baud, which is given by 

R(t) = (A/2) J2 [si«0r t/ T + <f>) - sin (cf>)] 

S(t ) = (A/2) [cos(jrr/J + 4>) + cos(0)]. 

Then, [R(t) - 5(f)] for n bauds is given by 

[—A • (n/2) • sin(<£)] • [A - (n/2) • cos(<^)] = [(A 2 n 2 /8) • sin(20)] (1 

and 

[R 2 (t) - 5 2 (f)] = (A 2 n 2 / 8) • cos(20)]. (1 


Now from (10) and (11), we have 


[2 • R(t) • 5(f)] __ (AV/4) • sin(20) 
[R 2 (t) - 5 2 (f)] “ (A 2 n 2 /4) • cos(2<£) 


= tan(2<£). 


Hence, the phase <t> can be found from 


<f> — 10.5 • tan' 


' 2./?(f) • 5(f) -I] 

.R 2 (t) — 5 2 (f) J J 


(1 


3.2c (ii) Signal presence detection - Signal presence detection is implemented using tl 
following relations: 


l(a) — R S q (a) cos 2 9 4- S S q(a) sin 2 9 + 2 RS(a) cos 9 sin 9, 
Qsq(er) = R S q(a)cos 2 9 + S S q(a) sin 2 9 — 2RS(a)cos9 sin9. 


(1 

(1 
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Figure 2. Digital implementation of MSK modem. 

The signal is said be present in the following two cases: 

1. If Q S q(cro ) is less than threshold r times I sq (ao), then the estimated phase is 9(o o) 
with a timing offset of oq. 

2. If l sq (ao) is less than x threshold times Q sq (cro), then the estimated phase is [9 (ao) — 
jt/2] with a timing offset of ao- 

If none of the above two is satisfied, then the signal is said to be not present. 

Here, r = 0.6612 is a threshold that gives approximately 10 -3 probability of false 
alarm in three independent tries at synchronization. 

3.2 c(iii) Coherent detection — The decision variables, 7*, are formed by 

T k = cos 9 ■ Rk(<ro) + sin(9 • Sk(cro)- (15) 

The hard decision decoding is done by decoding the decision variable Tk as ‘0’ when it is 
less than T and as ‘1’ when it is greater than or equals one. 

3.2c(iv) Differentidl decoder -The bits obtained from the decision variables in coherent 
detection are passed through a differential decoder of the form, 

bk = [bk@b k - 1] (16) 

where © stands for modulo-2 addition. The differential decoded bits thus obtained are 
packed into the required format. 

3.2d Carrier acquisition and tracking: Carrier frequency offsets upto (/* /128) Hz can 
be acquired by differencing phase estimates over shorter blocks of 16 bits, where fb is the 
data rate. 

Similarly, carrier tracking is done by computing the difference in phase estimates of suc¬ 
cessive blocks and then applying correction to the carrier accordingly. The frequency range 
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Figure 3. Performance of the modem in AWGN. 
over which the demodulator provides accurate f 

4 ’ Performance results 

The error probability in BPSK, QPSK and MSK is the 

p _ nc ,_. na MbK is the same and is given by 

e ~ °-5 • er fC\fEbJW 0 . 

to te rheoretical curve. Res „„ s XT 1 ■" figure 3, where,the solid lineconesponds 

In the presence of frequency offset the ^ 

°44 r w, Ph f ie u eS " n,a ' e block is less v I™" Whe " I*®* change 

of41 s h 3 , ra “ ° f 48 kbps The IhifaLeoa* corresponds to 41.5 Hz for 
• z is also shown in figure 3 * the nprfo modern under a frequency offset 
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5. Conclusions and summary 

An all-digital implementation 0 f a Ri«hc z, 
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start phase is either 0 or it for every N bit input. It is also observed from simulations that 
the optimum block size for phase estimation and bit timing recovery is 144 bits. It is shown 
that at [Eb/N q] > 1 dB, the performance of the modem with IF processing is within 1.0 dB 
of the theoretical performance achievable in AWGN. With a frequency offset of 41.5 Hz, 
the performance degrades by less than 0.75 dB compared to the one with no frequency 
offset. 
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Software Specification, Verification and Validation 


Foreword 

With the advances in technologies, computers are being increasingly used to monitor 
and/or control complex time critical physical processes. These safety-critical systems, 
used in a spectrum of applications such as nuclear control systems, flight control systems, 
life-support systems etc., have extended safety requirements. The use of formal methods 
has become pivotal in the design, development and maintenance of such safety-critical 
systems. These methods enforce several layers of specification design, verification and 
validation. 

T his special issue is dedicated to several aspects of software specification, verification 
and validation. 

The first paper “A graphic language based on timing diagrams” by C Antoine, B Le 
Geoff and J-E Pin presents a new graphic language which can serve as a model for VLSI 
and Control Systems. The second paper entitled “A real-time interval logic and its decision 
procedure” by Y S Ramakrishna, L K Dillon, L E Moser, P M Melliar-Smith and G Kutty 
introduces real-time future Int.logic through simple graphical formula for the specifica¬ 
tion of real-time/quantitative constraints on concurrent systems and discusses its decision 
procedures and complexities. 

The third paper “Validation and analysis of the futurebus arbitration protocol: A case 
study” by F Boussinot, S Ramesh, R K Shyamasundar and R de Simone argues how 
synchronous languages such as Esterel are useful in the validation and analysis of protocols. 
It also discusses how semi-automatic tools on such systems aid in gaining confidence of 
the protocols. The paper by A Isli “Converting a Biichi alternating automaton to a usual 
nondeterministic one” gives a translation of formula in linear propositional temporal logic 
into an equivalent Biichi Alternating Automaton which is used widely in the verification 
of protocol. The paper by W Reif and K Stenzel discusses reuse of proofs in software 
verification and reports case studies through the Karlsruhe Interactive Verifier (KIV). 

The last paper by K Vidyasankar is concerned with the construction of weakly atomic 
variables which have found use in the development of wait-free protocols. 

April 1996 R K SHYAMASUNDAR 

Guest Editor 
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Abstract. We present a new graphic language which can serve, for instance, 
as models for VLSI and control systems. Its primitives are based on standard 
timing diagrams, and this is a great advantage over other formalisms since 
designers can rapidly master it. The semantics is rigorously defined in the 
formalism of the theory of automata on infinite words. Using this formalism, 
we are able to give a rather precise upper-bound on the expressive power of our 
graphic language in terms of a language theoretic measure, the concatenation 
level. A detailed example is presented. 

Keywords. Graphic language; timing diagrams; concatenation level. 


1. Introduction 

This paper emerged as the result of a discussion between circuit designers and researchers 
working in the area of specification languages on the one hand and automata theory on the 
other. It has a practical component, the description of new formal specification language 
resembling timing diagrams, as well as a strong theoretical flavour, since the semantics of 
the language is based on results from the theory of automata on infinite words. 

Our work is motivated by the following observation: circuit designers are often discour¬ 
aged by the complexity of the specification languages. In an effort to remedy this problem, 
we introduce a graphic language, called the Chronogram Language (Antoine & Le Goff 
1991), the primitives of which are based on standard timing diagrams. Timing diagrams 
are a formalism which is commonly used in the community of circuit designers, so our 
language can be rapidly mastered. In other words, contrary to most formalisms, properties 
are drawn rather than written, and this pictural representation is much more convenient 
for the non-specialist than an abstract formalism. 
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On the other hand, the use of pictures does not preclude a precise syntax and semantics. 
It turns out that the primitives of our language can be conveniently interpreted as rational 
(also called regular) expressions on infinite words. It will follow from our syntax that all 
chronograms can be interpreted as rational expressions. This approach not only permits 
us to define rigorously the semantics of the chronogram language, but also gives precise 
information about its expressive power. Before stating our results, we need to briefly 
review some facts on rational sets of infinite words. 

There are two well-known scales to measure the complexity of a rational set, the log¬ 
ical scale and the combinatorial scale. The logical scale branches into two main parts, 
corresponding to the first order logic and to the monadic second order logic, respectively. 
Within the first order logic one can define a hierarchy by counting the number of alterna¬ 
tions between existential and universal quantifiers. The combinatorial scale, on the other 
hand, is based on the basic operations used to define the rational sets : boolean operations, 
concatenation product and iteration. It also branches into two main domains : the star-free 
sets (which can be defined without using iteration) and the rational sets. A hierarchy inside 
the star-free sets is obtained by counting the number of alternations between the use of the 
boolean operations and of the concatenation product. A nice (but non-trivial) feature is that 
the logical and the combinatorial scales coincide (Thomas 1982; Perrin & Pin 1986). Our 
main result states that the languages definable using chronograms are within level 3 in the 
star-free (or logical) hierarchy. This gives a rather precise upper bound to the expressive 
power of the Chronogram Language. 

Although our language was originally designed as a language for specifying circuit 
behaviour, it can serve more generally for modelling temporal properties. The Chrono¬ 
gram Language has been designed to provide designers with a good expressive power for 
temporal properties. For instance, both safety and liveness properties can be expressed in 
the Chronogram Language, in contrast with other languages VHDL (Lipsett et al 1990), 
Lucid (Ashcroft & Wadge 1976), Lustre (Caspi & Halbwachs 1986), Signal (Le Goff et al 
1989), etc. which cannot express liveness properties. To ensure compatibility with exist¬ 
ing formalisms, the chronograms that represent safety properties can be compiled into 
VHDL (a standard description language used in circuit design) and Signal expressions, 
and liveness properties will be translated into CTL* in the future. 

The paper is organized as follows. The Chronogram language is introduced through an 
example which is analysed later in § 6. The abstract syntax of the Chronogram Language is 
given in § 4. In order to keep the paper self-contained, the main definitions on languages and 
automata required for this paper are summarized in § 3. The semantics of the Chronogram 
Language are presented in § 5 and are illustrated by means of a detailed example in § 6. 
The paper concludes with our plan for future work. Our approach is illustrated by several 
examples of interpretations of chronograms involving rational sets of infinite words. 


2. A presentation of the Chronogram Language 

At the top of the chronogram in figure 1 is shown the CLOCK, which informally, represents 
the time. All the events are synchronized on the rising edges of the clock, except if all zones 
below the clock are IRRELEVANT zones. In the latter case, the duration of the signal is not 
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A chronogram. 


Figure 1. A chronogram. 

specified. This chronogram defines constraints on the events of three boolean signals: /, O 
and B. Each line is dedicated to a signal: the second one for B, the third one for O and the 
first and the last ones for I. Each line consists of IRRELEVANT zones and bold line boxes. 
Only the bold line boxes are relevant for the definition of constraints. On the second line 
(dedicated to the B signal), there are three boxes with a solid line at the bottom, and three 
boxes with a solid line at the top. This means that B must carry the true value during the 
period of time represented by the first three boxes, and the false value during the period of 
time represented by the last three boxes. On the first and third line, the boxes are labelled 
by a symbol (v, x or w). This means that during the period of time represented by the box, 
the signal carries the value v (resp. x or w). This value v (resp. x, w) is not specified in the 
chronogram but has to be the same in all boxes labelled by v (resp. x, w). A minus sign 
can be added in the left part of the box: in this case, the signal carries the value v opposite 
to the label v of the box. On the first line such a box is used with the symbol w. 

The bold line boxes can be connected by arrows. The resulting graph can have several 
(simply) connected components. Each component defines a constraint. The relative loca¬ 
tion of the boxes is relevant only inside a connected component. For instance, the properties 
1, 2,4, 5, 6 and 3 which are detailed in § 6 are specified in this order by the chronogram. 
Let us consider the first property: when the gate is opened, / and O carry the same value. 
The gate is opened if and only if B carries the false value. And in this case, I and O carry 
the value denoted by the symbol v in the chronogram: the arrows mean that “JB carries the 
false value” implies that I and O carry the value v. The other properties can be read in a 
similar way in the chronogram: two linked arrows must be interpreted as a logical and. 

3. Languages and automata 

In this section, we briefly recall some basic definitions from the theory of automata needed 
in this article. For more details, the reader is referred to Eilenberg (1974, 1976), Perrin 
(1990) and Thomas (1990). We also define the language-theoretic hierarchy that will serve 
as a measure of the expressive power of the Chronogram Language. 
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We denote respectively by A*, A + and A m the sets of finite words, non-empty finite 
words and infinite words on an alphabet A. A language is a set of finite words, that is, 
a subset of A*. The rational operations are the three operations union, product and star, 
defined on languages as follows 

(1) Union : L\ U L 2 = [u \ u s L\ or u e L%) 

(2) Product \ L\L% — {u\U 2 \ u\ e L\ and ui e L 2 } 

(3) Star : L* = {u\ ■ ■ ■ u n | n > 0 and u \,..., u n e L} 

The set of rational (or regular) languages of A* is the smallest set of subsets of A* con tainin g 
the finite sets and closed under finite union, product and star. For instance, (a, ab}*ab U 
( ba*b )* denotes a rational set. The rational subsets of A + are the rational subsets of A* 
that do not contain the empty word. It is possible to generalize the concept of rational 
languages to infinite words as follows. First, the product can be extended to A* x A w , by 
setting, for X C A* and Y C A w , 

XY = [xy | x 6 Xety e Y). 

Next, we define an infinite iteration co by setting, for every subset X of A + 

X M = [xqx\X 2 • • • I for all i > 0, x; e X}. 

That is, X M is the set of infinite words obtained by concatenating an infinite sequence of 
words of X. By definition, a subset of A" is co-rational (or co-regular) if it is equal to a 
finite union of sets of the form XY 10 where X and Y are non-empty rational sets of A + . 

Boolean operations comprise union, intersection, complementation and set difference. 
It can be shown that the rational subsets of A* are closed under finite boolean operations. 
The set of star-free subsets of A* is the smallest set of subsets of A* containing the finite 
sets and closed under finite boolean operations and product. 

For instance, A* is star-free, since it is the complement of the empty set. More generally, 
if B is a subset of the alphabet A, the set B* is also star-free since B* is the complement of 
the set of words that contain at least one letter of B' = A \ B. This leads to the following 
star-free expression (where X c denotes the complement of a set X). 

B* = A* \ A*(A \ B)A* = (0 C (A \ S)0 C ) C ' = (0 C (A C U B) C 0 C ) C 

Of course, B + = B* \ {£} is also star-free. 

The set of star-free subsets of A" is the smallest set S of subsets of A 0J closed under 
finite boolean operations and such that if X is a star-free subset of A + and Y € <S, then 
XY eS. 

The definition of star-free languages of A* makes use of two different types of opera¬ 
tions: boolean operations and concatenation product. By alternating the use of these two 
operations, one gets a hierarchy, called the concatenation hierarchy, defined as follows. 

(1) The sets of level 0 are the empty set and A*, 

(2) For every integer n > 0, the sets of level n + 1 /2 are the finite unions of the sets of 
the form 


Loa\L\a2 ■■ ■ a^L^ 
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where Lo, L\, .... L* are sets of level n and a\, .... a* are letters 

(3) For every integer n > 0, the sets of level n + 1 are finite boolean combinations of sets 
of level n + 1/2. 

Note that a set of level m is also a set of level n for every n > m. The languages of level 
1 /2 are the finite unions of languages of the form A*a\ A*a 2 • • -a^A*, the languages of 
level 1 are finite boolean combinations of these languages, etc. The following languages 
are of level 1 on the alphabet A: 

B' = A'\ |J A'aA- 

a&A\B 

{£} = A* \ (J A*aA* 

aeA 

B + = B*\{e] 

The next proposition summarizes several results relative to this hierarchy. 

PROPOSITION 1 

(Brzozowski & Knast 1978; Perrin & Pin 1986), 

(1) The finite languages have level 1. 

(2) For each n > 0, the languages of level n are closed under union, intersection, and 
complement. 

(3) For each n > 0, the languages of level n + 1/2 are closed under union, intersection, 
and product. 

(4) Let n > 0 and let (p : A* -+ B* be a monoid morphism. IfL is a language of level n 

(respectively n + 1/2), then is also of level n (respectively n + 1/2). 

(5) The hierarchy is strict for all n: there exist languages of level n 4- 1 that are not of 
level n + 1/2 and languages of level n + 1/2 that are not of level n. 

Concatenation hierarchies can be extended to infinite words as follows (Perrin & Pin 
1986). 

(1) The sets of level 0 are the empty set 0 and A®, 

(2) For every integer n > 0, the sets of level n + 1/2 are the finite unions of the sets of 
the form XaY, where X is a set of A* of level n + 1/2, Y is a subset of A® of level n 
and a is a letter. 

(3) For every n > 0, the sets of level n 4-1 are finite boolean combinations of sets of level 

n + 1/2. 

4. An abstract syntax of the Chronogram Language 

The abstract syntax of the language is given by a grammar, in which the initial of each 
non-terminal is a capital letter (e.g. Clock) and each terminal is either written in capital 
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letters (e.g. IDENTIFIER), or consists of a single lower-case letter (e.g. i) or of a non 
alphabetic sign (e.g. 1, *). 

The rules are grouped by level of derivation and every rule is written only once. As a 
consequence, the derivation rules of certain terms may precede some of their occurrences. 

Constraint ::= Property Constraint I Property 

Property ::= Clock Hypothesis Conclusion 

Hypothesis ::= TimeDiagram 
Conclusion ::= TimeDiagram 

TimeDiagram MultiColumnList 

Clock ::= IDENTIFIER 

MultiColumnList ::= MultiColumn MultiColumnList I MultiColumn 
MultiColumn : := StaticMultiColumn I DynamicMultiColumn 
StaticMultiColumn ::= Width StaticRowList 

DynamicMultiColumn : := FiniteLowerBoundUpperBoundDynamicRowList 

StaticRowList ::= StaticRow StaticRowList I StaticRow 
DynamicRowList ::= DynamicRow DynamicRowList I DynamicRow 

StaticRow ::= StaticIntervalList IDENTIFIER 
DynamicRow DynamicIntervalList IDENTIFIER 

UpperBound ::= FiniteUpperBound I * 

Width ::= INTEGER 
Length ::= INTEGER 
FiniteLowerBound ::= INTEGER 
FiniteUpperBound ::= INTEGER 

StaticIntervalList ::= Staticlnterval StaticIntervalList I NIL 
DynamicIntervalList :: = Dynamiclnterval DynamicIntervalList I NIL 

Staticlnterval ::= Length PrimitiveSymbol 
Dynamiclnterval :PrimitiveSymbol 

PrimitiveSymbol ::=ilfie|r|s|l|0| SymbolicValue 
SymbolicValue ::= - IDENTIFIER I IDENTIFIER 

The intuitive meaning of the primitive symbols is the following: 
i (Irrelevant) The value of the signal is not specified and can be either 0 or 1. 
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1 The signal is stable and its value is 1. 

0 The signal is stable and its value is 0. 

f (Falling) The signal owns one and only one falling edge (but may have 0, 1 or 2 rising 
edges). 

r (Rising) The signal owns one and only one rising edge (but may have 0, 1 or 2 falling 
edges). 

s (Stable) The signal is stable but its value is unknown, 
e (Edge) The value of the signal changes once and only once. 

5. Semantics of the Chronogram Language 

The formal semantics of the Chronogram Language are given in terms of cu-rational lan¬ 
guages. More precisely, a certain rational language is associated with each of the graphic 
primitives of the Chronogram Language and with each variable. Next, to each operator of 
the Chronogram Language (generation of intervals, rows, columns, multicolumns, time 
diagrams, etc.) corresponds an operation on languages that preserves rationality. A distin¬ 
guishing feature of the Chronogram Language is the use of symbolic values or boolean 
variables. We shall first detail this peculiar aspect, 

5.1 Boolean variables and valuations 

If v denotes a boolean variable, v will denote its complement. Thanks to boolean variables, 
one can specify in the Chronogram Language not only properties like “The value of the 
signal at time t is 0 (resp. 1)”, but also properties of the form “the value of the signal is v 
at time t and v at time t + 3”. In order to take in account these variables, it is convenient, 
in the first place, to represent a signal not as an infinite word on the alphabet B = {0, 1}, 
but as an infinite word on the extended alphabet C = B U V U V, where V is the set of 
variables used in the chronogram. 

One goes back to the binary alphabet B by associating a value with each variable. This 
can formally be realized by a valuation, that is a map v : C -* B such that 

a) for all b e B,v{b) = b 

b) for all v e V, v(5) = v(v). 

For instance, the previous example would be interpreted as “the value of the signal is 0 at 
time t and 1 at time t + 3“ (which corresponds to the valuation v defined by v(v) = 0) 
or “the value of the signal is 1 at time t and 0 at time t -I- 3” (which corresponds to the 
valuation v defined by v(u) = 1). 

A valuation v : C —► B defines in a natural way a function v : C* —r B*, by setting, 
for every word c\c 2 - ■ -c n <= C*, 

v(c\c 2 ■ ■ ■ c n ) = v(ci)v(c 2 ) • • • v(c„) 

If L is a subset of C*, the set v(L ) is called the valuation of L. 
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5.2 Constraints on a single signal 

A signal is considered as an infinite word u on the binary alphabet B. As we shall s 
later, the constraints defined on a given signal in our language can always be formulat 
under the form u 6 LB w , where L is a certain rational language of B*, that we shall n< 
compute in more detail. 

If the chronogram contains variables, we first identify the signal with an infinite wc 
u on the alphabet C, as was explained before. The constraint in which the variables j 
not interpreted can be formulated under the form u e LB 10 , where L is a certain ratioi 
language of C*, while the final constraint can be expressed under the form 

u e |J v(L)B“. 

V valuation 

There are in fact two types of constraints, the “static” constraints, which correspond to i 
case where L is a finite language, and the “dynamic” constraints, that correspond to i 
case where L can be an infinite language. 

In the case of a static constraint, the language L is obtained as a finite concatenation 
rational languages corresponding to static intervals. For instance, the following sequel 
of static intervals defines a constraint: “between time n\ and 222 , the signal has a unic 
rising edge, between time 222 and 223 , its value is a constant v and between time 123 and - 
its value is always 0 ”. Note that in this case, the values of 222 — «i, «3 and 224 — 
are the length of the static intervals. 

In the case of a dynamic constraint, the language L is obtained as a finite concatenat: 
of rational languages corresponding to dynamic intervals. For instance, the follow: 
sequence of dynamic intervals defines a constraint: “There exist instants 222 , 223,224 si 
that between time n \ and 222 , the signal has a unique rising edge, between time 222 and 
its value is a constant v and between time 223 and 224 , its value is always 0”. The differei 
with the previous case is that the values of 222 ~ 221,223 — 222 and 224 — 223 are not speed 
in the dynamic constraints, that is, can be chosen arbitrarily. 

The languages associated with (static or dynamic) intervals are themselves obtaii 
from the so-called primitive languages associated with the primitive symbols. This voca 
concerns the elements of the set 

Vuyup, 0 , 1 , f, r, s, e}, 

that is, all symbols of variables (possibly overlined) and the symbols associated with 
graphic primitives of the Chronogram Language. Recall the intuitive meaning of th 
primitives. 

i (Irrelevant) The value of the signal is not specified and can be either 0 or 1. 

1 The signal is stable and its value is 1. 

0 The signal is stable and its value is 0. 

f (Falling) The signal owns one and only one falling edge (but may have 0,1 or 2 ris 
edges). 

r (Rising) The signal owns one and only one rising edge (but may have 0,1 or 2 fall 
edges). 
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s (Stable) The signal is stable but its value is unknown, 
e (Edge) The value of the signal changes once and only once. 

This leads to the following table of the primitive languages associated with the graphic 
primitives: 

L(i) = ( 0 , 1 } + L( 1 ) = 1 + L( 0 ) = 0 + 

L(f) = 0 *l + 0 + l* L(r) = l* 0 + l + 0 * L(s)= 0 +Ul + 

L(e)=0+1 + U 1+0+ 

On the other hand, the primitive language associated with each variable v is 
L(v) = v + and L(v) — v + . 

We, therefore, have 
PROPOSITION 2 

The primitive languages and their valuations are star-free languages of level 3/2. 

Proof We have already seen that the languages L(i) = {0, 1} + , L(l) = 1 + , L(0) = 0 + , 
L(v) = v + and L(v) = v + are languages of level 1. It follows that L(s) = 0 + U 1 + is 
also of level 1. On the other hand, L(f ) = 0*l + 0 + l* is a product of languages of level 
1 /2 and thus is of level 3/2. A similar argument would show that the languages L(r) and 
L(e) are of level 3/2. Finally, each valuation of the languages L(v) and L(v) is equal to 
either 0 + or 1 +, which are languages of level 1 . □ 

We can now formally define the notion of interval. A static interval is a couple / = (l, t) 
where l is a positive integer and t is a primitive symbol. Intuitively, the integer l represents 
the length of the interval on which the condition defined by t will be considered. For 
example, if i — 5 and t — e, the value of the signal will change once and only once in the 
interval [0,5]. The language associated with I is the subset of C* defined by 

L(i) = L(ft) = Lionel 

For example, if l — 5 and t = e, then 

L(/) = (0+i + ui + o + )nc 5 

= { 01111 , 00111 , 00011 , 00001 , 10000 , 11000 , 11100 , 11110 } 

A dynamic interval is simply a primitive symbol and thus the corresponding language is 
already defined. 

A static (resp. dynamic) row is a sequence of static (resp. dynamic) intervals (figure 2). 
The language associated with a row (/i, I 2 ,..., I s ) is defined by 


L(I\, I 2 , ■ ■ ■, I s ) = E(/ 1 )L(/ 2 ) • • • L(I S ). 

For instance, the language of C + associated with the row represented below is 
llu{0, l}vi;{01111,00111, 00011, 00001}. 
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0 1 2 3 4 5 6 7 8 9 10 11 




Figure 2. A static row. 

Here is another example, for a dynamic row. If I\ = v, li = e and h = v, then 
L(I) = u + (0+l + U l + 0 + )u + . 

The languages associated with rows are described in the next proposition 
PROPOSITION 3 

The languages associated with a static row and their valuations are finite languages. The 
languages associated with a dynamic row and their valuations are languages of level 3/2. 

Proof. The language associated with a static row is an intersection of languages associated 
with static intervals, which are finite languages. Since the valuation of a finite language is 
finite, the first part of the statement follows. 

The language associated with a dynamic row is a product of languages of dynamic 
intervals. Now, by proposition 2, the languages associated with dynamic intervals and 
their valuations are of level 3/2 and by proposition 1, the product of languages of level 
3/2 is also of level 3/2. □ 

Finally, if L is the language associated with a (static or dynamic) row, the constraint defined 
by this row is the set 

U v(L)B®. 

V is a valuation 

In other words, in order to compute the constraint defined by a row, one first computes the 
language L associated with this row on the extended alphabet C and then one simply gives 
a value to the variables. For instance, for the row represented in figure 2, the constraint can 
be written 


(111 { 0 , 1 } 00 { 01111 , 00111 , 00011 , 00001 } 

U110{0,1}11{01111,00111,00011,000011)5" 

5.3 Constraints on several signals 

We now define the language associated with a constraint on a set of k signals. We first 
introduce some auxiliary notation. Let A be an alphabet. For each integer k, Ak denotes 
the alphabet consisting of fc-uple of letters of A, denoted as a column matrix. For instance. 



B% is the set of bytes, and the triple 


is a letter of B 3 . Thus 
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is a word on the alphabet Bj,. By reading in parallel the lines of the previous representation, 
one gets the three words 10110, 11000 and 10000. Therefore the word u given above can 
be represented by the triple (10110,11000,10000) of words of B*. More generally, it is 
always possible to represent a word of length n on the alphabet Ak as a A-array of words 
of A*. 


a\,\ \ 


< « 2 , 1 \ 


( ®n, 1 ^ 

a\,2 


a 2,2 

. . . 

2 

\«i ,k) 


\ a2,k ) 


\ &n,k / 


We denote by tta ■ A| -> A* x A* x • • • x A* the function defined by 

' — 1 1 - v — 1 

k times 


TTa 


/ 


fin 


( a2 A 

a 2,2 


( a n,\ \ 


\ 


an,2 


\\ai,k/ \a2,kJ \a n ,k / / 

(fl\, 102,1 '• -< 2 / 1 , 1 , « i , 2 < 32,2 • • • a n ,2, ■■■, a\,ka2,k ■ ■ ■ an,k) 


This function n a is in fact a monoid morphism of A£ into A* x A* x • • • x A* : this simply 
means that it preserves the concatenation product. However, it is not an isomorphism 
(except if A = 1) because an element of A* x A* x • • • x A* may have components of 
different length. Let 

Dk(A ) = {(mi, « 2 ,..., € A* x A* • • • x A* | |mi| = |w 2 | = ... = \u k \] 


denote the set of A-tuples of words of the same length. Now, since it a induces an isomor¬ 
phism from A* k onto D k (A), one can identify the A;-tuples of D k (A) with the words of 

A* 

A k- 

Returning once again to signals, a static multicolumn is a couple M = (p, R ) where 
p is a positive integer and R — (R \,..., R k ) is a A-uple of static rows. Intuitively, to 
each row corresponds a signal, but it is important to observe that two rows or more can 
represent the same physical signal. This allows one to impose several distinct constraints 
on a given signal and to conveniently display hypothesis-conclusion pairs, when a signal 
can figure in one set of hypotheses and in another set of conclusions. By definition, the 
language associated with a static multicolumn (/?, R) is 

L(p, R) = C^DTtcHURl) x L(R Z ) x • • ■ x L(R k ) n D k (C )) 

Inotherwords, the A-tuples («i, U 2 ,..., u k ) suchthatMi e L(R\), u z € L(R Z ), •••,«*€ 
L(Rk) and |mi| = \u z \ = ... = \u k \ = p are selected and identified with words of 
r* 

c k' 

A dynamic multicolumn is a triple M = (n,m, R) where n is a integer, m is either an 
integer or the symbol * and R = (R\,..., R k ) is a A-tuple of dynamic rows. Define 
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cf m] = U C l 

0 <i<m 

The language associated with a dynamic multicolumn is by definition 

L(n, m, R ) = C n k (cf' m] n ^{URf, x L(R 2 ) x • • • x L(i?*) n Z>*(C))), 

L(n, *, *) = C^(C| n Xc X {L(Ri) x I(J? 2 ) x - • • x L(R k ) fl D k (C))). 

The difference between the types of multicolumns is that, in a dynamic multicolumn, there 
may be no upper bound on the common length of the ’s. 

One can show for the multicolumns a result similar to the one obtained for rows 

PROPOSITION 4 

The languages associated with a static multicolumn and their valuations are finite lan¬ 
guages. The languages associated with a dynamic multicolumn and their valuations are 
languages of level 3/2. 

Proof. Let M = (p, R) be a static multicolumn. Then the language associated with M is 
a subset of C k and hence is finite. The valuations are subsets of B k and are also finite. 
The case of a dynamic multicolumn M = (n,m, R), where m is an integer, is similar. 
Finally, let M = (n, *, R) be a dynamic multicolumn. Then L(M ) = c'Act n 

n^ l (L(Ri) x L(R 2 ) x • • • x L(R k ) D D k (C ))). Since Cf is a finite language, it is of level 
1 by proposition 1. By the same proposition, the languages of level 3/2 are closed under 
intersection and product and it remains to show that the language tt^ l (L(R\) x L(Rf) x 
• • • x L(R k ) D D k (C)) is of level 3/2. Denote by jt/ the rth projection of C% on C*, defined 
by ni(cu c 2 , ■.. ,c k ) — Ci- We first observe that 

ttc\URf) x L(R 2 ) x • • • x L(R k ) n D k ( O) = f) 7tr\L{Ri)). 

\<i<k 

Indeed, the above language is actually the set of jfc-tuples (u j, u 2 ,..., u k ) such that \u\\ — 
\u 2 \ = ... = \u k \ and «,• e L(R/) for 1 < i < k. Now the languages L(R;) are of level 
3/2 by proposition 3, and by proposition 1, so are the languages 7tf l (L(Ri)) and their 
intersection. Therefore, n^ l (L(R\) x L(R 2 ) x • ■ • x L(R k ) n D k (C)) is of level 3/2, as 
required. 

Let v : C -+ B be a valuation. By definition, one has 

L V (M) = v^C%(c k r\TtQ l {L(R\) x L(R 2 ) x • • • x L{R k ) fl D^(C)))^ 

— B k v(c k f)n^ l (L(Ri) x L(R 2 ) x • - - x L(R k ) n D k (C))). 

A lemma is in order to treat this expression: 

Lemma 1. Let v : C B be a valuation and let L\> L 2 , ..., L& be languages of C*. 
Then the following formula holds 

v(c* k C\nf l {L(R { ) x L(R 2 ) x • • • x L(R k ) n D k {C))) 
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Proof. One has successively 

v(ct n 7Tc\L{Rx) X L(R 2 ) X • • • x L(Rk) n DkiO)) 


ci,i\ rc2,i 
Cl,2 I c 2,2 


c\,k! \ c 2 ,k 


m > 0 and 


Cl,lC2,l 


• --Cm 1 € L\, Ci,fcC2,fe • • ‘ c m,k <= L k 


v(ci,i) 

v(ci?) 


v(C 2 ,l) 

V(C2,2) 


v(ci,fc)/ \v(C 2 ,k)/ \v(Cm,k ) 


V (Cm , 1 ) 

v(c m , 2 ) 


m > 0 and 


Ci,lC2,l---C m ,l ell. Ci,kC2,k ■ • • Cm,k € L k 


h,i\ ( b 2,i 

b\ 2 I b 2,2 


b l,kJ V b 2,k 


m > 0 and 


--Vi €v(Li), •••, b i,k b 2 ,k--- b m,kev(L k ) 


{u\u 2 • • • M m I m > 0and7Ti(MiM2---Mm) 6 v(Li), . . 

^(«iU2-u m ).€v(L*)} = B**n Q 


Let us achieve the proof of proposition 4. By lemma 1, one has 

L„(M)= B^v(c k r\TtQ X {L{R\) x L(*2) x ••• x L(Si) H Oi(C))] 
= Q jrf^Lv(/?,-))). 


The languages L v (/?,) are of level 3/2 by Proposition 3, B n k is finite and Bj is of level ] 
Since the languages of level 3/2 are closed under intersection, the previous formula sho ^ 
that L V (M) is of level 3/2. 
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5.4 Timing diagrams and properties 

A timing diagram (TD) is a sequence of multicolumns. A property is a pair P = ( M, N) 
of timing diagrams: M = (M\, M 2 , ■.. M r ) is the hypothesis and N - (N\,N 2 ... N r ) 
is the conclusion. A property defines a particular binary relation on ^-tuples of signals. 
The property is satisfied if every fc-tuple of signals that satisfies the hypothesis satisfies the 
conclusion, too. 

In the rational language formalism, this can be translated as follows: an infinite word w 
on the alphabet Bk satisfies a property P = (M, N) if, for each suffix s of w and for all 
valuations v, if there exists a factorization u\U 2 ■ ■ ■ u r u r+ \ of s, where u\ g L V (M\), 112 g 
L V (M 2 ), ...,u r g L v (M r ), u r+ 1 € Bf, then there exists a factorization u\u' 2 ■ ■ ■ u' r u' r+l 
ofs, where u\ g L v (M[)CL v (N\),u' 2 € L V (M 2 )CL V (N 2 ),.. .,u' r g L v (M r )C\L v (N r ), 
u ' r +\ € can reformulated as follows. 

Theorem 1. An infinite word w on the alphabet Bk satisfies P if and only if none of its 
suffixes belong to the set 

K(P) — U L v (Mi)L v (M 2 )---L v (M r )Bf 

V valuation 

\({L v (Mi) P. L v (Ni))(L v (M 2 ) n L v (N 2 )) ■ ■ ■ (. L v (M r ) n L v (N r )))Bf. 

Proof. It is easier to consider the negation of the condition. By definition, an infinite word 
w on the alphabet Bk does not satisfy P if and only if there exist a suffix 5 of w, and a 
valuation v such that there exists a factorization of s of the forma = u\u 2 ■ ■ ■ u r u r+ 1 with 

u\ g L V (M\), u 2 € L v (M 2 ), ...,u r g L v (M r ), u r+ \ g Bf 

but such that 

S i (L v (Mi) n l v (N\))(l v (m 2 ) n l v (N 2 )) ■ ■ ■ (. i v (M r ) n l v (n t )))b%. 

The formula of the statement follows immediately. □ 

COROLLARY 1 

An infinite word w on the alphabet Bk satisfies a property P if and only if it belongs to the 
setL(P) = B£\B£K(P). 

We arrive at our main result. 

Theorem 2. For every property P, the set L(P) is a star-free set of level 3. 

Proof. Proposition 4 shows that the languages L V (M,) and L v (Ni) are of level 3/2. 
Since the languages of level 3/2 are closed under intersection and product, the sets 
L v (Mi)L v (M 2 ) • • • L v (M r )B°> and ((L v (Mi)nL v (Al 1 ))(L v (M 2 )nL u (lV 2 )) • • • (L„(M r ) 

n.L v (A r ))j B w are also of level 3/2. Therefore K ( P ) is of level 2 and L(P) is of level 3. 

□ 
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5.5 Constraints 

We call a constraint a finite sequence of properties. Let (Pi, P 2 , ... , P n ) be a constraint. 
Let L(P\), L(P 2 ), ..., L(P„ ) be the sets of infinite words defined by Pi, P 2 , ..., P n , 
respectively. Then the set of words defined by (Pi, P 2 , ..., P„) is the language L(Pi) fl 
L(P 2 ) n ... n L(P n ). In other words, a constraint is a conjunction of properties. Now, 
the languages of level 3 are closed under intersection. Therefore, theorem 2 implies the 
following result. 

COROLLARY 2 

The set of infinite words defined by a constraint is a star-free language of level 3. 

6. An example 

This section is devoted to the detailed study of an example. Consider a car washing machine. 
We propose to specify its control system using chronograms. The wash does not take more 
than one car at a time. There is a gate at the entrance. This gate is closed while a car is in the 
wash and opened if the wash is empty. Moreover, there are two switches set respectively 
at the entrance and at the exit of the wash. These switches may be either on or off at any 
instant, subject to the constraint that the one at the entrance toggles every time a car enters 
the wash, and the one at the exit toggles every time a car exits from the wash. 

The car wash control system can be modelled by three boolean signals denoted by B, 
/, and O. The signal B ( Entrance ) carries the 0 value to model the opened gate and the 1 
value to model the closed gate. The signals I (In) and O (Out) model the entrance switch 
and exit switch, respectively. Initially, the value of the signals B, I, and O is set to 0, which 
means the wash is empty and its gate is open. 

The following five properties specify the car wash control system. This set of properties 
may be neither consistent nor minimal. The set of figures below show the chronograms for 
these properties. An automaton model of the system induced from these properties is also 
proposed. Then we prove that a sixth constraint is effectively satisfied by the automaton. 
Here are the first five properties. 

(1) During any instant, if the gate is opened, I and O carry the same value. 

(2) During any instant, if I and O carry the same value, the gate is opened. 

(3) If the gate is closed during an instant t, then I is stable between t and t + 1 (since the 
machine cannot wash more than one car at a time). 

(4) If the gate is open during an instant t, then O is stable between t and t + 1 (since the 
machine is empty). 

(5) As soon as a car enters the wash, the gate closes. 

The property to be proved is the following: 

(6) When the gate is closed, the car that is in the machine will eventually exit. 

The chronograms of these properties are drawn in the figures 3 to 5. An explanation 
is in order for the chronogram representing property 6 (figure 5). Indeed, the clock 


Figure 3. The chronograms of properties 1 and 2. 


signal seems to count off one time unit between v and v. However, since the signals 
I and O carry the value IRRELEVANT during the same period, the clock signal is 
irrelevant. Thus property 6 is an unbounded liveness property. 


The semantics of these chronograms can be expressed by rational co-expressions. The 

(n\ 

basic alphabet is Bt, = {0, l} 3 . Each matrix I vj 1 represents one of the value of the 

\«3 / 



triple. The set of these matrices is the alphabet of the language on which the 


previous properties are defined. In the following definitions, v,v\, V 2 , etc. will denote 
boolean variables. 



Figure 4. The chronograms of properties 3 and 4. 
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Figure 5. The chronograms of properties 5 and 6. 


a) The first constraint states that if B = 0, then / and O carry the same value. In other 

(°\ (°\ 

words, the letters | 0 and 1 I cannot occur. Thus the set C\ associated with 


1\ /l\l® 


the first constraint is defined by 


b) The second constraint states that if I and O carry the same value, then B = 0. In 


other words, the letters I 0 and I 1 cannot occur, either. Therefore, the set C 2 

\o) Vi/ 

associated with the second constraint is defined by 


o\ /o\ /o\ /o 


0,0 ,1 ,1 , 0,1 


1 \ 


c) The third constraint states that if the letter u(t) is equal to I v , then the letter 


u(t 4- 1) is equal to I v . What can be written as Bf \ where 


1 11 ° J ’ l ° 11 1 I I Vl ’ u 2 , t> 3 e B 

v\) \U3/ \n) \t>3/ 


d) The fourth constraint states that if the letter u(t) is equal to I v\ I, then the letter 
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V2 \ 

u(t + 1) is equal to I v 3 . What can be written Bf \ B^KiB^, where 

°\( V A (°\( V A 1 

K 2 = { I V\ I I t > 3 I. I «1 j I ^3 I \vi,V 2 ,V3 eSj. 

(vi\ 

e) The fifth constraint states that if u(t) is equal to v j, and if u(t -+ 1 ) is equal to 

\v 2 ) 

'u 3 ' 

v 1, then 1)3 = 1. This can be written as Bf \ B^K 2 Bf, where 




*3 = 


'«i\ (0 

V I V I I v,v u v 2 ,v 3 e B 


, v 2 ) \V3. 


Let us consider the system specified by these five properties. The first two ones define the 
following set of words 


'O' 

0 

, 0 . 


1 ' 

0 

1 


1 ' 

1 

,0, 


'Q\ 1 

1 
1 


The last three ones define the following set of words Bf \ B^KBf, where 
K = Ki U K 2 U K 3 . 


The five properties altogether define the following set of words S = C 01 \ C* RC M , where 



Since the initial value of the three boolean signals is set to 0, the car wash control system 
can be represented by the automaton shown in figure 6 . 

Note that the states of this automaton are solutions of the equation I + O = B mod 2. 
Consider the sixth property, which is a liveness property. It says that, given an instant t such 

that u(t) is of the form J v 2 I, then there exists a subsequent instant s{s > t) such that 
\ v 3 / 
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( 1 , 1 . 0 ) 



Figure 6. The automaton of the five 
properties. 


u(s) is equal to 



. Thus, property (6) is described by a rational expression involving 


\V3/ 

six multicolumns: M i, M 2 , M 3 , N\, N 2 , N 2 . The languages they define are, respectively. 



/ 1 \ 

L(M\) = • 

\ V 2 \ \ v 2 ,v 2 e B 


. V W 3 / 


[ / Vl \ 

L(N X ) = ■ 

1 v' 2 I 1 VI, v' 2 € B 


[ \ V J 


L(M 2 ) = B* L(M 3 ) = B 2 , 


L(N 2 ) = B* 



\(<\ 


L(N 3 ) = • 


1 v[,v''eB 


l V V ) 



Let K(P) be the language representing the property. It follows 
K(P) = KiKfUK 3 K%, 

where 



It is theoretically possible to compute the automaton associated with K(P), and then to 
prove that 5 is a subset of £" \ B%K(P), where B" \ B^K(P) is given by corollary 1. 
More directly, one can observe that S C £“ \B%K(P) is equivalent to 5 n B| K(P) = 0 
which can be rewritten into 


T n K(P) = 0, 

where the set T = {u e Bf \ vu € S for some v e £ 3 } is recognized by the Bxichi 
automaton (Thomas 1990) given in figure 6 , by taking all the states as initial and final 
states. Recall that an infinite word u is accepted by a Biichi automaton if there is at least 
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an infinite run with label u starting at some initial state and visiting a final state infinitely 
often. Thus the equality T H K(P) = 0 can be directly verified on this automaton, since 
no word of K(P) can have an infinite run on the Biichi automaton for T. 


1 . Conclusion 

We have presented a new formal language for the specification of temporal properties 
of Discrete Event Dynamic Systems. This language, called the Chronogram Language, 
is based on a well-known graphic metaphor: waveforms. It allows specifying certain com¬ 
plex temporal properties in a more convenient way than textual temporal logics (CTL, 
CTL* ...). Although we do not consider this language as a universal one, we think that 
its graphical approach might be appealing to designers. In fact, we view the chronogram 
language as a basic part of a future Computer Aided Design (CAD) environment includ¬ 
ing validation tools. Several authors have developed similar work (Borriello 1992; Cingel 
1993; Coombes & McDermid 1993; Dillon et al 1994; Helbig et al 1993; Khordoc et al 
1991; Tiedemann 1992; Tiedemann et al 1992). It would be too long to compare in detail 
these related works with ours. However, the idea of using rational expressions to define 
the semantics seems to be new in this context and can probably be successfully applied 
in other cases. This is not really surprising, since the equivalence between proportional 
(linear) temporal logic, first order logic over the non negative integers with signature 
(<, R a (a e A)) (where R a is a predicate giving the positions of the letter a in a given 
infinite word of A w ) and star-free sets of infinite words is a well-known fact. 

The Chronogram Language is graphic and fully declarative. In this paper, we defined 
rigorously its semantics by using automata theory. The main result of this work is that it is 
possible to associate a finite automaton with any chronogram. This means that chronograms 
are ^-rational. In fact, as shown in this paper, chronograms correspond to a much smaller 
class that the class of a>-rational sets, and this may lead to some specific compilation 
algorithms in the future. 

We are now studying new developments: an extension of the Chronogram Language 
allowing designers to specify properties without any reference to some clock or including 
timing aspects (having physical time delays) and a consistency checking tool for sets of 
chronograms. We are also working on the improvement of the compilation algorithm since 
it is crucial to compile chronograms into as small as possible automata. Currently, compilers 
generate VHDL code and Signal code. New output languages will also be available in the 
future. 


Thanks are due to the anonymous referees for their numerous suggestions and remarks 
that greatly improved the quality of this paper. 
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Abstract. Real-Time Future Interval Logic is a temporal logic in which for¬ 
mulae have a natural graphical representation, resembling timing diagrams. It 
is a dense real-time logic that is based on two simple temporal primitives: in¬ 
terval modalities for the purely qualitative part and duration predicates for the 
quantitative part. This paper describes the logic and gives a decision procedure 
for satisfiability by reduction to the emptiness problem for Timed Biichi Au¬ 
tomata. This decision procedure forms the core of an automated proof-checker 
for the logic. The logic does not admit instantaneous states, and is invariant 
under real-time stuttering, properties that facilitate proof methods based on ab¬ 
straction and refinement. The logic appears to be as strong as one can hope 
for without sacrificing elementary decidability. Two natural extensions of the 
logic, along lines suggested in the literature, lead to either non-elementariness 
or undecidability. 

Keywords. Interval logics; concurrent systems; real-time temporal logics; 
hierarchical refinement. 

1. Introduction 

Specification and verification of concurrent systems is difficult in part because the many 
possible alternative interleavings of activities generate a large number of cases that must 
be considered. The presence of real-time constraints, and their interaction with constraints 
on the interleavings, makes the problem even more difficult. Propositional temporal logic 
(PTL) and the propositional /./.-calculus are too low-level to capture abstract system re¬ 
quirements easily without including extraneous details that can bias subsequent implemen¬ 
tations. Interval logics aid the specification of concurrent systems by providing temporal 
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modalities designed explicitly to ease the definition of temporal contexts and of properties 
required to hold in such contexts. 

Interval logics also peimit natural graphical representations, which are usually more 
intuitive and easier to understand than their textual counterparts. When expressed graph¬ 
ically, interval logic formula resemble the “back-of-the-envelope” timing diagrams that 
designers typically draw to document and reason about temporal properties of their designs. 
Interval logics, in their graphical representation, could serve to extend existing design and 
documentation environments to the more challenging task of verification of concurrent 
systems. 

However, most known interval logics are either non-elementary or even undecidable. In 
particular, the Interval Temporal Logic of Moszkowski (Halpem et al 1983) is provably 
non-elementary and the Modal Logic of Time-Intervals of Halpem & Shoham (1991) is 
undecidable. In Ramakrishna et al (1992) we presented an interval logic, called Future 
Interval Logic (FIL), and a decision procedure for it. As far as we are aware, this is the first 
and indeed the only interval logic known today, with an elementary decision procedure. 
Examples illustrating the use of FIL in specification and verification appear in Dillon et al 
(1992) and Kutty et al (1993). However, FIL is a “timeless” logic, with no quantitative 
notion of time. 

There are numerous applications, however, where a purely qualitative notion of time 
is insufficient, because correctness depends crucially on real-time constraints between 
events in a system. This has led to real-time extensions of temporal logics (Jahanian & 
Mok 1986; Narayana & Aaby 1988; Alur & Henzinger 1989; Emerson et al 1990; Lewis 
1990). The theory of timed-automata of Alur & Dill (1990) and Alur & Henzinger (1992) 
has helped clarify fundamental issues regarding the decidability of real-time temporal 
logics. These results have not, however, been applied to real-time extensions of interval 
logics (Melliar-Smith 1987; Narayana & Aaby 1988; Razouk & Gorlick 1989) to establish 
their decidability or to obtain “efficient” decision procedures. 

In this paper we extend FIL to real-time. We associate the domain of non-negative reals 
with a computation, and extend the language of FIL to allow statements about the durations 
of intervals. This gives a relatively clean extension of FIL. Firstly, the extension is con¬ 
servative. All tautologies of FIL are tautologies of this logic. Moreover, the tautologies of 
RTFEL, restricted to the language of FIL, are precisely the tautologies of FIL. Secondly, the 
extension does not sacrifice decidability. RTFIL is decidable by reduction to the emptiness 
problem for Timed Blichi Automata; this constitutes the main result of this paper. Finally, 
the extension is adequate. RTFEL has the expressiveness needed for real-time reasoning. 
We give an example of its use in Ramakrishna et al (1993), where a proof-checker based 
on the decision procedure presented here is used to verify a simple real-time system. 

Our work, like Barringer et al (1986) and Alur et al (1991) but unlike Narayana & Aaby 
(1988), Emerson et al (1990), Jahanian & Mok (1986) and Razouk & Gorlick (1989), uses 
a dense model of time. A dense time domain is preferable and to a discrete time domain for 
specifying concurrent systems because independent events in asynchronous components 
may occur arbitrarily close in time. It is not possible, therefore, to bound a priori the granu¬ 
larity of the underlying time domain, as required for a discrete model. A dense time domain 
is also preferable for carrying out hierarchical verification, since proofs remain valid under 
refinement or abstraction. A dense time domain facilitates compositional specification and 
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verification of real-time systems, where a component’s (timing) semantics must be inde¬ 
pendent of (the timing granularity of) the environment in which it may operate. Dense time 
is also required when a component interacts with the continuous world; hybrid systems are 
a good example (Maler etal 1991). Numerous real world and process control applications, 
thus, require a dense model of time. 

Unlike most real-time temporal logics, RTFIL is insensitive to instantaneous states. 1 
This semantics agrees with our intuition that a property of a system can be “observed” 
only if it persists for some measurable amount of time. It may be counterproductive, when 
specifying systems, to impose instantaneous requirements on behaviours. Specifications 
whose only satisfying models contain instantaneous states obstruct the use of hierarchical 
refinement in much the same way that the next operator obstructs hierarchical refinement 
(Barringer et al 1986; Lamport 1991) in non-real-time temporal logics. In the case of 
RTFIL, the absence of instantaneous states, in concert with its restricted syntax, results in 
the property that, for any model whose valuation function is right-continuous, the valuation 
function extended to an arbitrary RTFIL formula is also right-continuous. This property of 
“temporal interpolation” is expected to facilitate proofs based on successive refinement or 
abstraction, where the refinement mapping defining a predicate at one level may involve 
an arbitrary RTFIL formula on predicates from an adjacent level. 

This paper is organized as follows. Section 2 introduces Real-Time Future Interval Logic 
(RTFIL) by means of a simple graphical formula. It then defines a textual syntax, intended 
models, and semantics of RTFIL. Section 3 contains some preliminary definitions and 
notation. The decision procedure is described in § 4 and its correctness is proved in § 5. 
We present complexity results in § 6. In § 7 we discuss related work and conclude in § 8 
with some open problems and on-going work. 


2. The logic 

We first provide a very informal introduction to RTFIL and illustrate the graphical repre¬ 
sentation of formulae. RTFIL is a linear-time temporal logic. Thus, a formula is interpreted 
on a linear trace of states, representing a possible execution of a transition system (or a 
fragment of such an execution). Every trace has an initial state. Traces may, however, be 
unbounded and may thus represent nonterminating behaviours. We assume that the states 
of the transition system are continuously observed at all t e R (the set of non-negative 
reals); thus, every trace of the system is a dense real-time trace. 

The key constructs of RTFIL are the interval modality and the duration constraint. 
Syntactically, an interval modality is constructed by means of searches and other (simpler) 
RTFIL formulae. Semantically, an interval modality extracts a convex subset from a given 
dense trace. This convex subset specifies the interval over which a property designated by 
a nested formula holds. The duration constraint is expressed using the special predicate 
len, and specifies rational lower and upper bounds on the length of an interval. 

An interval is constructed using a pair of search patterns', searches are shown dashed 
with arrowheads, and target formulas are left-justified below the arrowheads. The semantics 


'The decidability results presented here do not require this semantics. 
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— 1 near 
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near 



Figure 1. An example specification in 
Graphical Interval Logic. 

of a search that starts at a point in the trace is that the search locates the earliest point in 
the reflexive future where the target formula holds. When such searches are composed 
sequentially into a search pattern, every subsequent search begins at the state where the 
previous search ended. In case the target of a search is not satisfied at any point in the future 
of the current point within the previous outer interval, the formula is assumed to be true 
by default if the search is “weak” (shown by a single arrowhead) and false if the search is 
“strong” (shown by a double arrowhead). Intervals are shown solid with square brackets 
on the left and parentheses on the right. A formula drawn left-justified below the start of 
an interval must hold at the first state of that interval, while a formula indented below an 
interval must hold throughout the interval. In the graphical representation of an RTFIL 
formula, the horizontal dimension shows progression through the trace (time progresses 
from left to right) and the vertical dimension describes the composition of formulae from 
subformulae. 

The example in figure 1 is a fragment of a road intersection specification. The state 
predicates near , cross and green are true, respectively, when a car is near an intersection, 
when it is crossing the intersection and when the signal along that direction is green. It 
states that, if the signal is green whenever a car first approaches the intersection, it takes 
more than 3.5 seconds but at most 7 seconds to complete the crossing. 

Although graphical formulas such as that above are easier to read and understand than 
their textual counterparts, the rest of the paper will use a textual syntax for convenience of 
exposition. 

2.1 Syntax 

The sets of well-formed formulae (wffs), well-formed search patterns (wfsp), and well- 
formed interval modalities (wfim) of RTFIL are defined relative to a finite set V of primitive 
propositions by the following BNF grammar. We use / for a wff, p eV for a primitive 
proposition, 9 for a wfsp, I for a wfim and d e Q (the set of non-negative rationals) for a 
duration, each possibly with a subscript. 

/ = true | p | len(0, d] I -/ I /i a f 2 | // 

J = [-|0) | [0|->O I [0i I 0 2 ) 

0 = ->/ I -+'f,9 

Although in the syntax above, we do not use 0 to include the trivial search patterns, ” 
and in the sequel we shall use the meta-variable 9 to mean any search pattern, trivial 

or non-trivial, unless explicitly noted otherwise. 


green 
len(3.5, 7] 
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near => ^ 


\->near | —>) 0 (cross A 0—*cross)A 

near \->near,-across,-'cross)(green =>> len(3.5, 7]) 


Figure 2. Textual equivalent of the graphical specification in figure 1. 

We consider two special sets of well-formed strings when defining the semantics. The 
first, the set of wfsp, will be denoted by $rchp(V), and the second, the set of wfim, will be 
denoted by imodCP). In addition, we shall call a wff purely propositional if it is formed 
by the following grammar: 

/ = true I p | -/ | fi a h 

We use false as an abbreviation for -’true, / v g as an abbreviation for -»(-*/ a -«g), 
len(d, oo) as an abbreviation for -’len(0, d], and len(^i , <^ 2 ] as an abbreviation for 
\en(d\, 00 ) a len(0, d^. The traditional temporal operators are defined by 

0 / 1 = -■[->/1 ->)false 
af d g [-►-,/ | -+)faise 
/ Ug1= [-*■(-•/ v g) | -»)# 

and so on. 

In FEL, the formula If, I an interval modality and / a formula, has the semantics “if 
the interval designated by / exists, then / holds at the initial state within that interval.” 
Syntactically RTFIL is just FIL extended with the timing primitives len(0, d] for d e Q, 
the domain of durations. The formula I len(0, d] has the natural interpretation that if the 
interval I exists then its duration is no more than d. Intuitively, len(0, d] asserts that the 
duration of the remaining (suffix) interval is at most d time units. A search to len(0, d] 
locates the earliest future point within the current interval, such that the duration of the 
remaining interval is at most d. Consequently, over an interval of infinite duration len(0, d] 
is never satisfied . 2 

A detailed description of the translation of graphical formulae to the textual syntax is 
beyond the scope of this paper (details appear in Dillon et al (1994)). For purposes of 
illustration, however, we note that the graphical formula given in figure 1 translates to the 
textual RTFIL formula given in figure 2. 


2.2 Models 


The models on which we interpret RTFIL formulae are partial functions from the non¬ 
negative reals R (the time domain) to states, which assign valuations to the primitive 
propositions. We represent a model by a total function M : R -> 2 V U {_L}, where V is the 
set of primitive propositions and J_ represents undefined. 3 We require a model for RTFIL 
to satisfy the following requirement of admissibility. 


2 The semantics of formulae containing wfim that involve timing primitives can be counterintuitive. Thus, while such 
formulae are decidable in the logic at no extra cost, their use should probably be avoided. 

3 We assume that all functions and predicates, except equality, are strict, i.e. if any of its arguments is X then the 
result of a function or predicate is also X. For equality, however, we regard X=X to be true and X= x and x =X 
to be false if x is not X. 
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DEFINITION 1 

[Admissibility] A function F:R ->■ X U {_L} is 

- finitely variable iff, for any two elements t\ < ?2 in R, there are only finitely many 
changes in F between t\ and ti 

- right continuous iff for any t e R, linty_» f+ Fit') = F(t) 

F is admissible iff it is finitely variable, right continuous, dom F = {t e R | F(t) fiL } is 
a left-closed right-open segment of R, and im F = {x e X \ 3r e R, F(t) = x } is finite. 
The above definitions of finite variability and right continuity are stated relative to an 
arbitrary valuation function on R in order that we can also use them with extended models 
(see theorem 3 in § 3.2 below), and not just with models. Note that finite variability implies 
discreteness, but finiteness of the image set is a stronger requirement. These definitions 
are equivalent to the standard ones in the literature. 

Intuitively, the domain of a model represents the interval (or “context”) over which a 
formula is evaluated. Finite variability ensures that a system performs only a finite number 
of actions in any finite period of time and right continuity guarantees that a property 
can be observed only if it holds over an interval with a positive duration. Together these 
conditions imply that corresponding to every proposition p there is a sequence to, t\,... 
of time values, with lim,--^ r, = oo, that partition the time domain R into half-open 
segments [t;, f,-+i) over which the valuation of p is constant. We call any model satisfying 
the above properties an admissible model. We write J_x for the everywhere undefined 
model ±. M : R -> 2 V U {J_}, which satisfies dom(A^) = 0 and is (trivially) admissible. 

An observation regarding the condition of right continuity is in order. The semantics of 
RTFIL can be generalized to admit models that are not right continuous. However, as long 
as the semantics are defined so as to be insensitive to instantaneous states, a formula will 
be satisfiable in the more general class of models precisely if it is satisfiable in the class of 
right continuous models. Moreover, the semantics of RTFIL are simpler to state and more 
intuitive if formulae are interpreted over right continuous models only. 

An admissible model M satisfies an RTFIL formula if the formula is true when evaluated 
at the initial state of M, where the valuation of formulae is defined below. If an admissible 
model represents an entire behaviour of a system, then its domain will be all of R. (To 
represent a terminating behaviour by such a model, the last state of the behaviour is 
stuttered.) However, in general, the domain of an admissible model may be any left-closed 
right-open segment of R. 

2.3 Semantics 

We now give a formal definition of the semantics of RTFIL, which have been explained 
informally above. The semantics are a natural extension of the FIL semantics (see 
Ramakrishna et al 1992). They are defined here with respect to a dense, rather than a 
discrete, time domain. Moreover, the syntax of FIL does not contain timing primitives, so 
that FIL formulae describe only constraints on the ordering of states. 

The semantics make use of the “locator” function A. for locating the result of a search 
and the “constructor” function C for constructing the subinterval, given the current interval 
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and the states located by the searches. For brevity, we use R 1 ' 00 to denote R U {J_, oo} 
below. 

DEFINITION 2 

The search-locator function 

A.: srchp(P) x (2 V U {_L })* x R 1 ' 00 -*■ R X ’°° 

is defined by 

- If M = J -m or f =-L then 

k(6, (M,t )) =JL 

- If M and t ^_L then 

k(-,(M,t)) =t 

\(—>,(M,t)) = supdom.M 

_ } _L, if (M, t') £ a for all /' > t,t' e dom M 
(-><3, ,t)) — | | t' >t, (M, t') \= a}, otherwise 

H-*a, 6, (M, t)) = k(9, {M, k(-*a, {M, /»)) 

The model-constructor function 

C: imod(P) x (2 V U {J_ })* x R -*■ (2 P U {± })* 
is defined by 

C(Wl I ^2>. {M, t)) — Mx(O u (M,t))M02dM,t)) 
where M tu t 2 with t \. ti € R x,0 °, represents the subinterval model defined by 
•A dt l ,± = A4± t t 2 =-l -m 

and M tu t 2 is the restriction of M to [fj, 12 ) if H t^-L and tj ^_L. 

DEFINITION 3 

[Semantics] The valuation of an RTFIL formula is defined at a point t e dom M in an 
admissible model M € (2 V U {J. })* using the satisfaction relation defined below. 

If M = J -m then 

- (M, t) \= f 

If M then 

- (M, t) 1= true and ( M , t) ^ false 

- (M, t) |= p, for p e V iff p € M(t) 

- (M, t) |= -■/ iff (M, t)V=f 

- {M, t) N / A g iff (M, t) b / and (M, t) |= g 
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— (M,t) j= len(0, d~\ iff t < supdom M <t + d 

- (M, t) (= If iff (M\ inf dom M'} 1= / where M' = C(I, (M , f» 

We say that / is true at t in M iff {M, t) |= / and that it is false otherwise. 

A form ula / is satisfiable iff there exists an admissible model 
M € (2 T U {1 }) R such that dom M — R and (M, 0) \= f. We then say that M is a 
satisfying model for /. A formula / is valid iff every admissible model M for which 
dom M = R is a satisfying model for /. 

The theorem below follows from the definition of admissibility and from the semantics, 
by induction on the structure of an RTFIL formula. The proof of this theorem is subsumed 
by that of theorem 3, which appears in the next section. 

Theorem 1. Let f be any RTFIL formula and let M be an admissible model. Then for 
any t e R, (M, t) (= / iff there exists e > 0 such that for all t < t r < t + e, {M, t') 1= /. 

This theorem motivates the choice of len(0, d] and, by negation, lenW, oo) as timing 
primitives. If, for instance, we had chosen len[<2, oo) (with the intuitive semantics) as the 
basic timing primitive then, if At is an admissible model with dom M = [0,1), we would 
have (M, 0) (= len[l, oo) although (M, t) Y= len[l, oo) for any t > 0, and theorem 1 
would no longer be valid. 

The significance of theorem 1 springs from the fact that it ensures that any refinement 
mapping definable in RTFIL preserves admissibility. The absence of such a property would 
make refinement proofs difficult, since a refinement mapping on a given level might pos¬ 
sibly produce an inadmissible model at the next lower level. This means that, at every 
stage, in order to apply further refinements, one would first have to prove that the pre¬ 
vious mapping preserved admissibility. Moreover, it would overly restrict the applicable 
mappings. 


3. Preliminaries 

This section introduces important concepts required by the decision procedure. In particu¬ 
lar, it describes the timed automata used in deciding satisfiability and the various concepts 
of reductions and clocks used in the construction of the automata for the decision procedure. 


3.1 Timed Biichi automata and timed co-strings 

The approach we use for our decision procedure is closely related to the procedure for 
the untimed logic in Ramakrishna et al (1992). The first step in that procedure is the 
construction of a Biichi Automaton (BA) for a formula, such that the formula is satis¬ 
fied iff the automaton has a non-empty language. This is the basic automata-theoretic 
approach (Wolper 1987). However, since RTFIL deals with real-time rather than only or¬ 
der relations, the notion of automata on infinite strings is now extended to that of timed 
automata on timed strings. 
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DEFINITION 4 

[Timed w-string] A timed co-string over the alphabet E is an infinite sequence 
((cr,, ti))i eco in (X x R) a such that {t,)/ 6a) is an unbounded strictly monotonically in¬ 
creasing sequence, with to > 0. 

Observe that an admissible model for our logic identifies a timed string over the alphabet 
2 V . In fact, in some of our subsequent proofs we shall use this representation for RTFIL 
models rather than the one we gave in the previous section. 

The following definition of Timed Biichi Automaton (TBA) is a special case of the TBA 
described in Alur & Dill (1990). 

DEFINITION 5 

A Timed Biichi Automaton A is a tuple (E, S, C, p, S,, S F ) where 

- E is a finite input alphabet 

- S is a finite set of states 

- C is a finite set of clocks 

- p:Sx E -+ 2^ 2Cx2 * (C) is the transition function where <3>(C), the set of clock 
conditions, is the set of inequalities of the form c < t and c = t, for c e C and t G Q 

- S, c S is the set of possible initial states 

- S F c S is the set of accepting states. 

The transition function p defines for each state s and input cr a set of triples, where 
each triple {s', C', <p) e p(s, a) specifies a next state s', a set C of clocks reset with 
that transition and a set 4> of clock conditions that must be satisfied at the moment of 
the transition. We say that a clock assignment y G R c satisfies a set of clock conditions 
<j> c <J> (C) iff the set ^equalities <p[c ■«— y(c)] obtained by replacing each clock variable 
c in 4> by the corresponding value y(c) is satisfied. 4 If (s', C. <p) e p(s, cr), we say that 

,i , . . . 

p allows the transition s —> s . 

A run of A on an o>-string a = ((cr/, f,))/ e (E x R) w is an cu-string 1Z(A, cr) = 
<(s«, Yi))i € (S x R C ) M satisfying 

- Initiality : sq e Sj, and for all c e C, yo(c) = 0 

- Transitions : for each z , there is a set of clocks C; c C and a finite set <pi C <J>(C) of 
clock conditions such that 

,, , ,. Oj, Ci , (pi 

- p allows the transition s,- —> s, + i 

- the inequalities in 0, [c ■*- y, (c) + p — t,_i] are satisfied, where f_i = 0 

- yi + 1 (c) = 0 for all c e C,- 

- Yi+i(c) = Yi(c ) + h - ti-i for all c e C \ Ci 


4 As usual the empty set of conditions imposes no conditions and so is always satisfied. 
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We write (so, yo) (si, yi) • • • when these conditions hold. Such a run is 
accepting iff the set {i \ s; 6 S F } is infinite. The language of a TBA is non-empty iff 
there is a timed co-string over its alphabet on which it has an accepting run. 

Intuitively, a TBA reads a timed T-string and makes transitions satisfying its transition 
function. It has a finite set of clocks, which proceed at the same rate, and which it can reset 
with a transition or compare with rational constants. Transitions must satisfy the associated 
clock conditions for the input string to be consumed. The operational semantics of the run 
shown above are that the automaton stays in state s,- at time t, f,_i < t < f;. At time f; 
it moves into state s; + i resetting the clocks in C ; . The remaining clocks have meanwhile 
advanced by the time spent in s;. The semantics of the input string a are that it is a model 
Mff such that for ti~\ < t < ti and i e co, M a ( t ) = cr;. We say that the TBA A consumes 
a timed E-string when there exists a run of A on the string and that it accepts the string 
when some such run is accepting. 

We note for the sequel that a BA can be regarded as a TBA whose set of clocks is empty. 
We take this as our definition of BA below. Because the set of clocks of a BA is empty, 
its transition function is regarded as a function p: S x E -»■ 2 s , and it ignores the timing 
information on a timed co-string. 

DEFINITION 6 

[Untiming] We define a polymorphic untiming function as follows 

— When given a timed co-string (cr;, ti)ieco, it returns the untimed co-string 

untime((cr,, f;); €ft >) = (cr;) ieoi 

— When given a TBA A = (E, S, C, p, S 7 , S F ), it returns the BA 

untime(A) = (E, S, p', S 7 , S F ) 
where the transition function p'\ S x E -> 2 s is defined by 
p'(s, a) = { s' | {s', C, <j>) € p(s, a )} 

Lemma 1. For a timed co-string a and TBA A, if A accepts cr then untime {A) accepts 
untime(cr). 

Proof. The statement follows immediately from the definition of untiming. □ 

Observe that the admissibility requirement makes the acceptance criterion for our TBAs 
slightly more restrictive than that in Alur & Dill (1990). However, it is not difficult to see 
that, because of our more restricted edge conditions <t>, if there is any accepting run of the 
TBA by the less restrictive definition of Alur & Dill (1990), there is a also an admissible 
model on which the TBA has an accepting run by our definition. 5 Thus, the emptiness 
algorithm of Alur & Dill (1990) suffices for our purpose. 


5 The latter model is obtained by simply closing each interval of the former model on the left (and opening its 
successor interval on the right) - that this does not violate any of the transition conditions in the course of an 
identical run of our automaton on the latter model is easy to establish, using the fact that edge conditions are of the 
form c = t or c < t only. 
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Theorem 2. Alur & Dill (1990) It is decidable whether the language of a TBA is empty. 
3.2 Subformulce, reductions and extensions 

The concept of subformula closure set, reductor set and reductions on interval formulae 
for FIL were introduced in Ramakrishna et al (1992). The first is well-known in automata- 
theoretic approaches in conventional temporal logics. The latter were introduced to simplify 
the statement of the FIL decision procedure and correspond, roughly, to the so-called 
rewrite rules used in the method of semantic tableaux. 

The definitions that follow are straightforward extensions of those appearing in 
Ramakrishna etal (1992) to take into account the presence of duration formulae, i.e. those 
involving predicates of the form len(0, d]. 

The subformula closure scl(/) captures the idea that in deciding the satisfiability of 
the formula /, one need only consider formulae in the set scl(/). The formulae in the set 
intuitively represent all the “verification conditions” arising in an on-line strategy to verify 
if / is satisfied by an arbitrary model. As in Fischer & Ladner (1977) our closure is an 
extended subformula closure, sometimes also called Fischer-Ladner Closure, in the sense 
that scl(/) may contain formulae that are not syntactic subformulae of /. 

Notation 1. Let / be an interval modality and let F be a set of formulae. Then I.F denotes 
the set of formulae {If \ f e F}. If F is empty then so is I.F. 

DEFINITION 7 

[Subformula Closure] The subformula closure scl (/) of a formula / is the smallest 
set such that 6 

a) / € scl(/). 

b) true e scl(/) and false e scl(/). 

c) f\ e scl(/) iff-/i € scl(/). 

d) if /i a f 2 e scl(/) then f\ e scl(/) and f 2 e scl(/). 

e) if ha, 0i | d 2 )fi € scl(/) or [0 T | ->a, 0 2 )/i e scl(/) 
then [0i | 0 2 )/i e scl(/). 

f) if [->a | 0 2 )/i e scl(/) then if 0 2 is not -* then [- | 0 2 )/i e scl(/) and if 0 2 is ->• 
then /i e scl(/). 

g) if any of [-±a, 6\ | 0 2 )/i, [0i | -+a, 0 2 )/i, [-| 0 2 )/i, or [0i | -+a)f\ is in scl(/) 
then a e scl(/). 

h) if [0i I 0 2 )/i € scl(/) then 

— if 0i is not — then [0i | —>-)false e scl(/), and 

— if 0 2 is not —> then [0 2 | ^-'jfalse e scl(/). 


6 As usual we identify -•-'fi with f\, ->true with false and -■false with true. 
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fl.fl 





true 


f 

def 

[~+p\ -»p,-»<2)-len(0,3) 

fl 

def 

[- |-^p,-^ 9 )-len(0,3] 

h 

def 

[- | ->p, -^-5)len(0,3] 

h 

def 

p | —len(0,3] 

/4 

def 

[— | p , —^^)f3lS6 


def 


/5 


[- | -+p,-*q )true 

h 

def 

[- |->^)-len(0,3] 

fi 

def 

[~ 1 ->^)len(0, 3] 

h 

def 

[— >p, —>q | —>)felSG 

ft 

def 

[~ | -^^)false 

/io 

def 

[- | ~>q )true 

/n 

def 

[—^q | —^)false 

fxi 

def 

[~+p | ->)false 


Figure 3. Example illustrating the subformula closure definition. 


i) if [- | 6) f\ e scl(/) and f\ is purely propositional then f\ e scl(/) (recall that 
duration predicates are not purely propositional) 

j) if [- | 0)fi € scl(/) then [- I 0).sd(/i) C scl(/). 

Example 1. Let / be the formula [->p | —»-p, — >q)-< len(0, 3], where p,q e V and 
let fa • • •, f \2 represent the subformulae as shown in figure 3. Then scl(/) consists of 
precisely the formulae /, f\, ■ ■ ■ fa, p, q, true and all their negations. This is shown in 
the figure in the form of a Hasse Diagram, where a formula f (and its negation) is in the 
subformula closure of another formula f" (or its negation) if either f and f" are at the 
same “node” (such as, for instance, f\ and f 2 ) or if f is below f" and reachable from it 
(such as for instance f\ i and f \). We assume that at every node, a formula and its negation 
are both present although, for clarity, we do not explicitly show the negation. 

Consider now a model M and t e R, such that the formulae p and [—>p \ -+p, —>q)-< 
len(0, d] are both satisfied at (M, t). Clearly, the “search” to p starting at t will locate 
the current point, so that, as a result, the formula [— | -><?)-’ len(0, d] must also hold at 
t. Moreover, the formula [— | -+q) !en(0, d ] must not hold at t, unless either q holds at t 
(the interval “collapses”) or q never holds for any t' > t (the search “fails”), i.e. unless q 
or [-»# | ->) false also holds at t. This notion of a set of formulae forcing the truth of other 
formulae is closely related to the concept of (finitary) “forcing” in descriptive set theory, 
and motivates the following series of definitions, culminating with lemma 2. 

DEFINITION 8 

[Redtjctor Set] The reductor set red (/) of a formula / is the smallest set of wff, 
not containing /, such that 


- if / is of the form [-+a, 9 X \ d 2 )fi, H-a I 0 2 )fi, [6>i | 9 2 )f\ 
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or [0i | ->a)/i then a € red(/) 

- if / is of the form -> f\ then red(/i) c red(/) 

- if / is of the form [9\ | 02) f\ then 

- if 0i is not — then [0i | ->)false e red(/) and 

- if 02 is not —» then [02 | ->)false € red(/) 

- if / is of the form [— | 02 )/i then [— | 02 ).red(/i) c red(/) 

- if / is of any other form then red(/) = 0. 

DEFINITION 9 

[Reducibility] Let a and / be formulas. Then / is a-reducible iff a e red(/). 
Otherwise it is a-irreducible. If S is a set of formulae, then / is S-reducible iff it is ir¬ 
reducible for some a e S. 

DEFINITION 10 

[Reduction] Let a, f be such that a e red(/). Then the wff f is an a-reduct of /, 
written f < a /, iff one of the following holds 

- / is of the form [-»n, 0i | 02 )/i or [0i | ->a, 02 )/i and /' is [0i | 02 )/i 

- / is of the form [-*a | 02 ) /i and f is [— | 02 )/i 

- / is of the form [0i | -+a)f\ and f is true 

- / is of the form [0i | 02 )/i, a is either [0i | ->)false or [02 | ->)false, and f is true 

- / is of the form [->-a | -*)/i and f is f\ 

- f is of the form ->/i, f is ->/[ and f[ < a f\ 

- f is of the form [- | 02 )/i, a is [- | 6 2 )b, f is [- | 02 ) f[ and // -<b fl- 

When / is reducible to f through a chain of reductions with respect to formulae in a 
set S, we say /' <* s f. 

Example 2. Continuing with example 1, figure 4 illustrates the definitions just given. In 
the figure, if a formula f is reachable from a formula f" by a direct edge labelled with a 
formula a, then /' < a f". Thus, the fanout labels of a node f are precisely the formulae in 
red(/'). For instance, / is p-reducible but ^-irreducible. Moreover, p transitively reduces 
/ to f(y. This reduced formula is now ^-reducible, so that true /• Note also that 

directly reduces / to true. 

Observe that for a wff a, the parameterized reduction operator -< a on wff, has been 
defined so that f < a f guarantees that a =» (/' s /) and scl(/') c scl(/). This 
is formalized in the next lemma, which helps motivate the construction of the untimed 
automaton described in § 4.2. 

Lemma 2. Let f, f, a be formulas and Mbea model such that (M, t) |= a and f < a /• 
Then (M, t) f iff (M, t) |= /'. 
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Figure 4. Example illustrating reduc¬ 
tions. 

Proof. The proof is by a case analysis of the reduction rule used in the reduction (defini¬ 
tion 10). The presence of the last rule requires an induction. We use induction on the number 
of applications of the last rule in the reduction. The base case involves an application of 
one of the earlier rales, which can be proved easily using the semantics of FIL. 

For the induction step, let / be [— | 82 ) fi, a be [— | 82 )b and f be [- | 82 ) f{- To 
establish the forwards implication, assume that (M, t) f= /, ( M, t) \= a and /' < a f ■ 
Using the semantics of the logic, it is clear that ( Mt) (= f\, and ( M! , t ) |= b, where 
M! = But, by the definition of reduction, /[ <t> fi . so that by the induc¬ 

tion hypothesis (M', t) |= f[. Again, from the semantics, we have (M, /) |= [— I 82 ) f[ 
as required. The backwards imphcation follows similarly. D 

COROLLARY 1 

Let (M, t) 1= a for all a e S and let f <* s f. Then {M, t) |= / iff {M, t) (= /'. 

Example 3. Observe, in our running example, that p =s> (/ = / 1 ), {p A q) => / and 
h => f ■ Note also that, for a formula /, the formulae which are the (transitive) reducts 
of / give rise to a complete lattice under the relation “is a reduct of.” 

We have so far represented models as mappings from R to the powerset of primitive 
propositions. It is a useful abstraction for the description of the decision procedure and for 
the subsequent correctness proofs to extend this mapping so that it provides valuations to 
every formula in scl(/). 

DEFINITION 11 

[Model Extension] Given an admissible model M e ( 2 V ) R , its extension with 
respect to an RTFIL formula / is the function R -+ 2 scl( ^ satisfying M f ft) = 
{fl I fl € scl(/), (A-t, t) t=/i }• We call Al-'" an extended model, and each set (t) an 
extended state. 


j 



f8,f!2 
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It is easy to see that for an arbitrary model, extension is well-defined and, thus, that 
corresponding to every model there is a unique extension with respect to a given formula. 
Moreover, every state M ?(?) in an extended model M.f is 

- consistent in the sense that for any formula f € scl(/), f e M ?(?) only if —*/' 
Mf(t) 

- complete (up to elements in scl(/)) in the sense that for any formula f e scl(/) either 
f € M? (?) or ->/ € M.f (?). 

Theorem 3. Admissibility of models is preserved under extension. 

Recall that the real-line is partitioned by any primitive proposition P into a sequence 
of segments over which the valuation of P is constant. We may extend this concept to 
arbitrary subsets of formulas in scl(/), such that two points ?i < ?2 e R are in the same 
equivalence class iff all points ? such that ?i < ? < ?2 yield the same valuation for all 
formulae in the set. Intuitively, our proof of theorem 3 uses the fact that the partition of 
the real-line induced by any RTFIL formula /, not involving duration predicates, is at 
most as fine as the coarsest partition that refines the partitions induced by the formulae in 
scl(/) \ {/, Moreover, if every equivalence class belonging to one of a finite set of 
partitions is left-closed and right-open, then so is every equivalence class in the coarsest 
partition that refines these partitions. For formulae containing duration predicates, we note 
that there is at most one (right-continuous) change in the valuation of a duration predicate 
in any finite segment of R, and no change in any infinite segment of R. 

The proof of theorem 3 makes heavy use of the following lemma, the proof of which is 
straightforward. 

Lemma 3. Let X\ and X 2 be finitely variable and right continuous functions from R to 
finite subsets of a set S. Let P(bi, ■ ■ •, b n ) be a boolean function ofn variables b \, • • •, b n , 
and let x \,... ,x n be elements of S. Then the functions 1 

1. X: R -*• 2 s defined by X(t) = X \(?) U X 2 (t) 

2. B: R -+ {true, false] defined by B(t) = P[bi+-(xi € X] (?))]; 
are also finitely variable (FV) and right-continuous (RC). 

Proof of theorem 3. Let M bean admissible model. Then dom M f = dom M. Moreover, 
since scl(/) is finite for any formula /, clearly M? is image finite. It remains to prove that 
M.f is right-continuous and finitely variable. The proof is by induction on the inclusion 
order induced by the subformula closure. 

For the first of two base cases, we note that 

A AP(t s _ | (true, p] if P € M(t) 

^ 1 { {true, ~'p] otherwise 

Finite variability and right continuity of M p then follows easily from that of M for any 
p e V. 


7 The abbreviation P[x; denotes simultaneous substitution of y, for x;, for every i. 
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For the remaining base case, we note that sup dom M # t for t e dom M, so that, for 
any t e dom(Af), d e Q, 


M ]emd] (t) = 


{true, len(0, d}} if sup dom M — t < d 
{true, - 1 len(0, d~\) otherwise 


Thus there is at most one right-continuous change in the valuation of _/Vll en (M] over 
domAf' en(0 ’^. 

For the induction step, we consider two sample cases. The remaining cases are similar. 
Case 1. Consider M^ A ^ 2 . We have 

= M fl (t ) U M f2 {t) U X(t) 

where 

y (t \ _ { ifi A h) if fieM* it) and f 2 e M*(t) 

( ' i {—■(/! A h)) otherwise 


Clearly X is FV and RC by the second clause of lemma 3, since M^ ] and M ? 2 are. By 
the first clause of lemma 3, so is 

Case 2. Consider now the case of Mf with / = [— »■ a , 9\ | -+b, 6 2 )f. 

From the definitions of extension and subformula closure we have 

4 

M f (t) = U M fi (t) U M a (t) U M b {t) U X(t) 

i=l 

with 

/l 6 or 

h € or 

a € M a (t) and f 2 e M^(t), or 
b e M b (t) and f 4 £ or 

Bit) 
wise 

where 


X(t) = 


{/} if 

{-’/} oth« 


/i = [-, 9i | ->)false 
fi — [-* *b, 0 2 I —>-)false 
f3 = [0i\^b,9 2 )f' 
f 4 = [->a,di I e 2 )f 


and B ( t ) is a boolean condition defined by 


B{t) = 


/ a e M a it') A / 3 € M^it') 
y it" (t < t" < t' =» -‘b e M b it")-'a e M a (t")j 

I be M b it')f 4 e M f *it') 

\ 'it" (t < t" <t' =$ -•be M b it")->a e M a it")} 
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We now show that B(t) is itself RC and FV. By the induction hypothesis each of the 
functions M a , M b , M& and M? 4 is RC and FV. Consider now an arbitrary point t e 
dom M. We have the following possibilities. Either a e M a (t) or be M b (t) or neither. 
In the first two cases B(t ) is false, and continues to be false at least up to (but possibly not 
including) the least t' where neither a e M a (t') nor b e M b (t'). Consider therefore the 
third case, for which ->a e M a (t) and —•b e M b (t). Now we have two cases depending 
on whether there is any point t' > t where either a e M a (t') or b e M b (t'). 

- Assume not. Then clearly B continues to be false for all t' > t. 

- In the alternative case, let t' > t be the least point such that either a e M a It') or 
b e M b (t'). Then B is false on [/, t') if 

-((a € M a (t’)h € M f \t')) v (be M h (t')ff e Mffft’))), 

and otherwise B is true on [t, t'). 

This establishes the RC of B. 

Let D M a represent the set of points at which M a has a (left) discontinuity, and similarly 
D M b forM b . For a subset S of R andr € R,letS 11 = {s <s S | s < t }. The FV condition 
for M a is then equivalent to saying that D M a 11 is finite for any t e R. By the induction 
hypothesis M a and M b are FV, so each of D M a and D M b has this property and, therefore, 
so also does D M a U D M b , and a fortiori any subset of D M a U D m i, . As our argument 
above for RC of B clearly shows, B is constant between any two consecutive points (in 
the usual ordering) in D M a U D M b. Therefore, Db £ D M a U D M b, giving FV for B. 
Now, using lemma 3 we obtain RC and FV, first for X, and then for M/. □ 

Note that right continuity of M? for an arbitrary / gives us theorem 1 as a corollary to 
theorem 3. 

The above theorem plays a crucial role in our completeness proof. The automata that we 
build in the sequel operate on extended models. Satisfying models for / are obtained by 
restricting the extended models accepted by the automaton for / to the set V of primitive 
propositions. 

Our definition of reductions yields the following property of extensions, which helps 
motivate the construction of the untimed automaton in § 4.2. 

Notation 2. In what follows, I represents a string of zero or more interval modalities of 
the form [— | 0 ), which we refer to as current modalities. 

Lemma 4. Let M be an admissible model and f\ e scl(/) be M?(t)-irreducible. Let 
t' € R be the least t' > t such that M? ( t ) V M?(t ). Then 

a) if fl i s T[$i |$ 2 ) fl where 9\ is not — then (M, t) \= f\ iff (M, t') \= f\ 

b) if fi is l— , [9\\9f) f2 where 9\ is not — then (M, t) |= f\ iff both 
(M, t') |= f\ and (M, t') ^ Xfalse 

c) if ff is Jlen(0, d] and (M, t ) |= ff then (M, t’) (= I- len(0, d] iff 
(M, t') 1= Jfalse 
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d) if f\ is X-> len(0, d] and (M, t) |= f\ then ( M , t') Xfalse. 

Intuitively, in the first case, if X[0\ \ Of) can be constructed, it lies in the strict future of t 
and, therefore, in the reflexive future of t'. In the second case, [6y | Of) can be constructed 
within X (its surrounding context), so X cannot collapse at t'. For the third case 1 must 
collapse at t' since its duration cannot increase in going from t to t’. Finally, for the last 
case, X cannot collapse before its duration becomes less than d (at the earliest such point 
Xlen(0, d] must hold). 

In the following proof, we say that “a search ->a at t resolves at a point t' > t in a 
model M" when either 

— t' — t and (, M , t) (= a, or 

— t' > t, ( M , t ') |= a and for all t" such that t <t" < t\ ( M , t") a. 

Proof sketch of lemma 4. Proofs of each of the four clauses are sketched below. 

[Clause 1.] We sketch only the proof of the forwards direction, the reverse direction 
follows by similar arguments. From the definition of reductions, we know that since f\ is 
M.f (f) -irreducible, all searches in X, 6\ and Oi must resolve in the strict future of t and 
not before t'. The semantics of searches immediately gives us (M , t') |= f\. Note that 
none of the searches in 1 ,0\ or O 2 can “fail” since our definition of reductions ensures the 
reducibility of f\ to true in M? ( t ) in such a case. 

[Clause 2.] Once again, we shall sketch only a proof for the forwards direction. 
For the forwards direction, the proof that the first consequent follows is essentially along 
the lines of the last case. We show why the second consequent, (M, t') f Xfalse, also 
holds. Assume for a contradiction that (M, t) (= X—\0\ | Of)fz, (M, t') (= Xfalse and f\ 
is M.f (t) -irreducible. As in the last clause, then, all of the searches in X, 0\ and O 2 will 
resolve in the strict future and not before t'. Since all modalities in X are current, the left end¬ 
points of all these intervals are at t. With the above, (M, t') |= Xfalse implies that the right 
endpoint of one of the intervals in X was located at t'. From the semantics, therefore, in fact 
{M, t') |= If for an arbitrary formula f, and in particular, (M, t') |= X[9\ | — >-)false. 
Now, since 1[0\ \ Of)f is M/ (f)-irreducible, so also is X[0\ | ->)false. By the reverse 
direction of clause 1 above, therefore, (M, t) |= X[0 \ | ->-)false. But X[0\ | ->-)false e 
red(/i) thus contradicting the assumption of irreducibility of f\ in M? ( t ). 

[Clause 3.] From the semantics, we know that all searches in X resolve in the strict 
future, not before t'. Thus the right endpoint of the instance of interval X, cannot be before 
t'. If it is at t\ then some search in X caused an interval to “collapse” at t', so that from the 
semantics (M, t') \=- Xfalse and, therefore, also (M, t') |= X—> len(0, d]. If not (i.e. if the 
right endpoint is in the future of t'\ assume that the right endpoint is located at some t\ > t', 
then from the semantics, t\ <t+d. Moreover, since the instance of interval X beginning 
at t' also ends at t\, surely the duration of that interval is also less than d, since from the 
above t\ <t' + d. In this case, both {M, t') ^ X-> len(0, d] and (M, t') Xfalse. 

[Clause 4.] The argument here is quite similar to the previous. The right end-point 
of the instance of interval X starting at t ends either at t' or later. In the first case, there 
must exist a point between t and t' at which an instance of the interval X (also ending at t') 
has duration at most d. At this time (say, t\ ), we have (M,t\) (= X len(0, d], contradicting 
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Table 1. Example illustrating model extension. 
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the assumption that t' is the first time greater than t at which the m/ changes. Thus, we 
need only consider the second case. In such a case, the instance of 1 starting at t' cannot 
collapse at t', giving us the result. □ 

Example 4. Let M be defined by M(t) = 0 for t e [0, 1), M(t) = {p} for t e [1,7), and 
M 00 = {p, q) for t € [7, oo). The reader can verify that M/ ( t ) is defined by the matrix 
shown in table 1, where the f are as defined in figure 3. In the table, a row denotes an 
interval / of R. A formula appearing in a column is in M/ (t), t e /, iff the entry in that 
column is a 1 and its negation is in (t) iff the entry in that column is a 0. The example 
also illustrates the ideas in lemmas 2 and 4. 

Finally, we introduce the following notation that we use to describe the construction of 
the eventuality automaton. 

DEFINITION 12 

[Basis] Let /, f be formulae and let 5 be a set of formulae such that f -<£ f and f 
is S-irreducible. Then f is a basis formula for / with respect to S, and is denoted by 
f=(f)s. 

Note that the basis formula for any formula with respect to a given set is unique. The 
proof relies on the local confluence property of -<s and the absence of infinite descending 
chains. This ensures global confluence by Newman’s Diamond Lemma (Newman 1942). 
It is useful to bear this in mind (and we shall implicitly assume this in our subsequent 
exposition) although we do not require this property for any of our subsequent proofs. 



166 


Y S Ramakrishna et al 


Example 5. For the case of example 4, for instance, /g = {f) M f{t) for t e [1,7) and 
true = {) for t e [7, oo). Note also that / is irreducible at t e [0,1) and is 
(trivially) its own basis with respect to M^(t), t e [0, 1). 


3.3 Interval reductions, clocks and conditions 

In Example 4 there are no formulae involving nested interval modalities. However, in 
general, a formula may involve nested modalities, so that for ease in describing our con¬ 
structions, we require the more general machinery below. 

Roughly speaking, the essential “real-time” unit of manipulation by the TBA is a timed 
current interval formula of the form rlen[0, d] or X-> len[0, d], where X = [— |0i)[— \d 2 ) • • • 
[— \0 n ) is a string of zero or more current interval modalities. For the case of such formulae, 
we also need the concept of an interval-reduct. Interval reduction is a relation on strings 
of current interval modalities and is parameterized by a set of formulae. 

DEFINITION 13 

[Interval Reduction] Let I and X' denote strings of current interval modalities and 
let S be a set of RTFIL formulae. Then X' is an interval reduct of X with respect to S iff 
X'true -<* s Xtrue. We represent this by X' C£ X and we say that 1 is S-reducible. 

Note that I' above may be the “empty” sequence of modalities (which we suppress), which 
is irreducible with respect to any S. We shall simply say “I 7 is a reduct of I” instead of 
“X' is an interval reduct of X,” when there is no confusion. 

Among the possible reductions on an interval modality is a special kind of reduction 
called a collapsing reduction. A collapsing reduction may trigger the checking of clock 
conditions on a transition that was just taken, and so our procedure must treat it differently 
from a non-collapsing reduction. This will become clear later when we describe the TBA 
construction. 

DEFINITION 14 

[Collapsing Reductions] Let X = Iil 2 ■ ■ ■ I n and I' - I[ I ' 2 ■ ■ ■ I' m be such that 
X' X and m < n. Then X' is a collapsed reduct of X and the corresponding operation 
is a collapsing reduction, written Zj. 

The important property of interval reductions that we require for the sequel is as follows. 
Suppose M is admissible, t e R and X is M? (r)-irreducible. Suppose further that there is 
a next (least) timer' > rsuchthatA4^ (t') ^ M/ (r).ThenXis M? (r')-reducibletoX'ifX 
is of the form X\ [- | -+a)l 2 or X\ [- | —ya, B)l 2 where X\ a e M? ( t'). Intuitively, then, 
X is equivalent to the syntactically simpler formula X' when evaluated at t' . Moreover, the 
reduction of X in Af^(r') is collapsing in the case that X has the first form. Essentially, 
this means that, if the interval X is evaluated at time t, it will “end” at time t' and, if it is 
evaluated at r', it will be empty. 

Example 6. Continuing with Example 4, the modality [— | ~yq) collapses at all t e 
[7, oo). The modality [— | -*/>, -><?) reduces to [— | -><?) at r e [1, oo) and collapses at 
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t e [7, oo). In each case the “set” with respect to which the collapse or reduction occurs 
is M ?(; t ), for the appropriate f. 

We also use reductions on intervals to keep track of the “remaining searches” of an 
interval as it is timed by an active clock of the automaton. 

The clock closure and clock condition sets defined below represent the clocks and associ¬ 
ated conditions required by a TBA during the satisfiability procedure. Thus, while deciding 
a formula /, the automaton A(f) never needs any timers other than those in clocks(/) 
and the conditions appearing on its transitions are contained in the set clkconds(/). 

DEFINITION 15 

[Clock Closures] Given a formula / its clock closure set , denoted clocks(/), is the 
smallest set satisfying the following conditions: 

1. if Hen (0, d] e scl(/) then cf x ' d e clocks(/) 

2. if df x ' d e clocks(/) and To ll£ X\, for 5 C scl(/), then c^f’ d € clocks(/) 

3. if C jf' d e clocks(/) then c d ' x ' d e clocks(/) 

DEFINITION 16 

[Clock Condition Set] Given a formula /, its clock condition set , cIkconds(/) is 
the set of conditions of the form 

- c <d for all c = cf x ' d e c!ocks(/) 

- c = d for all c = <£f ,d e clocks(/) 

Intuitively, a-clocks enforce upper-bound constraints and /J-clocks enforce lower-bound 
constraints. States in the TBA for a formula will contain “clock-activity sets,” which 
indicate the clocks that are active. The clock c% x ' d (where y is either a or will be made 
active at a state within an instance of an interval X when it is necessary to time X, and X\ 
is the interval that remains to complete the instance of X. 

Example 7. Let / be [-* p | -»/>, -*#)-’ len(0,3], Then clocks(/) contains the clocks, 
c pi7) P,_V ^' 3 ’ c l^q) q) ' 3, an< ^ t ^ ie ^ r P counterparts. 8 The clock condition 
associated with c = c“_£^’ 3 is c < 3 and with its /3-counterpart d = c^^ q) ' 3 is d = 3. 

As in Ramakrishna et al (1992), let the number of logical connectives and primitive 
propositions in an RTFIL formula be its size, and the depth of nesting of interval modalities, 
plus one, be its depth. The following lemma is straightforward. 

Lemma 5. For an RTFIL formula f of size n and depth k, |scl(/)| = 0(n k ) and 
|clocks(/)| = 0(n 2k ). 


‘We are using the abbreviation [0) for [— ( 6 ). 
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4. Decision procedure 

We now have most of the formal machinery required to describe the construction of the 
TBA A m (f) corresponding to a formula /, whose satisfiability is being checked. The 
construction of A m is described in four steps. 

In the first step, we construct a BA A u (f ) containing timing assertions in its states. 
This construction is similar to the construction of the local automaton for the untimed case 
Ramakrishna et al (1992). Intuitively, the automaton produced in this first step ensures 
that all timing-independent safety conditions are satisfied and also checks some simple 
consistency conditions relating to real-time. The BA A u (/) accepts the untiming of any 
timed string corresponding to a model of /, but may also accept other strings, since it does 
not fully take into account the real-time constraints imposed by /. The states of A u (f ) 
are annotated by timing assertions that encode these constraints. 

The second step is the heart of the construction. This step constructs a TBA, At (/), from 
A u (/) in such a manner that all timing assertions, of the form I len (0 ,d] and J-> len(0, d], 
annotating the states of A u ( f) are encoded as timer related actions of the TBA. Each state 
of the TBA A t (/) has a set of “active clocks,” a subset of clocks(/), that is uses to enforce 
the timing assertions. The edges of At(f) have timer resetting and comparison actions. 
A t (f), thus, ensures that all timing based properties are handled properly, in addition to 
the timeless safety conditions. In this connection, it is useful to note that a time-bounded 
liveness property is really a safety property; the time bound must not pass before the 
liveness property is satisfied. That the requisite time must eventually pass — the condition 
of non-Zenoness — is an implicit liveness condition. 

In order to take care of the timeless liveness conditions, we construct the eventuality 
automaton A e (f) in the third step of the construction of A m . The eventuality automaton 
is a pure BA, without any timers. It is constructed in essentially the same manner as for 
FIL (Ramakrishna et al 1992). 

The final automaton A m (/) is a product of A t (/) and A e (f). The formula / is satisfiable 
iff the TBA A m (/) accepts some timed string. We use the procedure by Alur & Dill (1990) 
to solve the emptiness problem. 

An interesting aspect of RTFTL is reflected in this construction. The local automaton 
A t (f) might consume non-Zeno runs, but A m (f ) does not. This is because, in RTFIL, 
unlike for instance MITL (Alur et al 1991), there is an implicit liveness condition asso¬ 
ciated with every timing constraint, namely, the right endpoint of an interval satisfying 
the timing constraint is eventually found. This allows us to, in effect, dispense with the 
“progressiveness check” that Alur & Dill (1990) require while checking the emptiness of 
the final TBA. 


4.1 Hintikka sets 

Most constructive decision procedures use sets of formulae to construct the “components” 
of a canonical model for a given formula. The formulae in the sets, like the states in 
the model extensions above, give a complete characterization of that component of the 
model in terms of not only the atomic formulae (primitive propositions), but also more 
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complicated formulae. Following tradition (Smullyan 1968; Emerson 1990) we call such 
a set of formulae a Hintikka set for an RTFIL formula. 

DEFINITION 17 

[Hintikka Set] A Hintikka set for a formula f is a subset s of scl(/) satisfying the 
following conditions: 

1. for all /i 6 scl(/), /i € s iff ->fi & s 

2. for all Itrue € scl(/) such that Xtrue is s-irreducible, Xtrue e s and Xfalse £ s 

3. for all len(0, d] € scl(/), -■ len(0, d] e s 

4. for all X/i e scl(/) such that X/i is s-irreducible, Xf\ e s iff T~>f\ & s 

5. forallX(/i a / 2 ) € scl(/) such that X(/i A / 2 ) is s-irreducible, 

X(/i a / 2 ) 6 s iff X/! e s and X/ 2 e s 

6. for all X/i e scl(/) such that f\ is purely propositional and X/i is s-irreducible, 
X/i e s iff f\ 6 s (note that len(0, d] is not propositional) 

7. for all X—’[—|# 2 )/i € sci(/) such that X->[—|# 2 )/i is s-irreducible, 

I-'[-\0 2 )fi e s iff xi—|^ 2 >—/i es 

8. for all fi, / 2 € scl(/) such that /i ■<* / 2 , /i € s iff / 2 6 s 

The set of all Hintikka sets for / is denoted H(/). 

As a result of the first rule, Hintikka sets are complete and consistent in the sense of p 160. 
However, they may contain temporal inconsistencies that may make them unsatisfiable. The 
completeness proof for our decision procedure uses the fact that if a set is not Hintikka then 
it is unsatisfiable. Thus, it suffices to consider Hintikka sets in the automaton construction, 
as we shall see shortly. 

Lemma 6. Any complete subset ofscl( f ) that is not Hintikka is not satisfiable. 

Proof. Assume that s is a complete subset of scl(/) that is not Hintikka. We use a case 
analysis on the condition in definition 17 that s violates. Consider for instance the last 
condition. Assume that f\ / 2 , f\ e s but / 2 f s. Since s is complete, ->/ 2 e s. Let M 
be a satisfying model for s. Then {M, 0) |= f\ and for all a e s, (M, 0) [= a, so that by 
Corollary 1, ( M , 0} [= / 2 . But ->/ 2 e s and, thus, (M , 0) [= ->/ 2 , a contradiction. The 
case of /i f s and / 2 e s is similar. 

Arguments for the remaining cases can be done in a similar manner using the semantics 
of the logic to exhibit a contradiction. □ 

It follows that each state M-f (t) of the extended model M? is Hintikka. However, not 
every &>-sequence of Hintikka sets is the extension of a model, because the consecution of 
states in the sequence might be unsatisfiable. 

Example 8. When M? is constant throughout the interval [*i, f 2 ), let M? [?i, f 2 ) denote its 
value in that interval. In Example 4, it is clear that the sets Si = [0,1), S 2 = M? [1,4), 

S3 = M^[ 4,7), S4 = M/ [7, 00 ) are Hintikka. In this case the conjunction of formulas in 



170 


Y S Ramakrishna et al 


a Hintikka set is satisfiable. However, consider the set 55 = (Si \ {->/n}) U {/ 11 }. This 
set is Hintikka by our definition above, but is not satisfiable, because the conjunction of 
-’/g and /11 cannot be satisfied in any model. Such “temporal conflicts” are detected by 
the consecution and acceptance conditions of A e (f) and A t (f ), as will become clear in 
the sequel. 

4.2 Untimed construction 

Having obtained the candidate states for A u (f ) as Hintikka sets above, we must now 
connect them together appropriately. Compared to FIL Ramakrishna et al (1992), the 
only new feature now is the presence of formulae of the form J len(0, d] and T-> len(0, d]. 
Reductions on such formulae in a given state are essentially as before. However, consecution 
of two different states imposes further conditions on the timing assertions that these two 
states may contain, in addition to the reducibility of non-current interval formulae from 
one state to the next. 

DEFINITION 18 

[Untimed Construction] A u (/) is the BA with 

- Input alphabet 2 scl( - /) 

- State set H(/) 

- Non-deterministic transition function p u defined on H(/) x 2 scl C) such that p u allows 
s -V t iff 

1 . i = s 

2. if 1[9 1 16b) f\ € s is s-irreducible and Q\ is not —, 
then 1[9\ | 6 > 2 )/i € t 

3. if T->[0] |0?)/i € s is s-irreducible and 0\ is not —, 
thenl-’f#! | 62 ) f\ e t and Jfalse ^ t 

4. if X len(0, d] e s is s-irreducible, then if X-> len(0, d\ e t then I has a collapsing 
reduction in t 

5. if T-i len(0, d] € s is s-irreducible, then Xfalse ^ t 

- Accepting state set H(/) 

- Initial state set [s € H(/) | / € s} 

The first transition rule ensures that the automaton consumes only Hintikka sets. The 
remaining transition rules reflect the conditions stated in lemma 4. Observe that p u is 
reflexive, allowing the automaton to (non-deterministically) stay in state s whenever input 
with i = s. 

Example 9. Consider the Hintikka sets Si, • • •, S 4 of the last example, and M? of Example 
4. If we feed untime(At^) to A u (f) as an un timed cv-string, then the resulting run is shown 
in figure 5. The vertices represent states of the automaton and the edge labels represent 
letters of the input string. Note that the automaton A u (f) has many other states and 
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SI -* S2 -* S3 -- S4 

Figure 5. A run of A u for example 9. 


transitions, but for brevity only those in the locus of this run are shown. The reader can 
verify that the transition conditions given above are satisfied for each transition shown. 

4.3 Timing augmentation 

The timing augmentation systematically examines each state of the automaton built above, 
starting from an initial state, adding activity indicators to its states and clock conditions 
to its transitions, and splitting states when necessary. State-splitting occurs when different 
paths from an initial state to some state of A u if) require different sets of timers to be 
active. The resulting automaton is the required local TBA. 

The augmentation is described in two steps. First, we replicate the states of A u (f), 
pairing the replicas with subsets of clocks(/), to obtain the states of Atif). Intuitively, 
for (s, a 5 ) € H(/) x 2 clocks ^\ the clock-activity set a s represents the clocks that are 
active in this replica of the state s of A u {f). We then define the transition function of 
A t {f) to permit only “legal” transitions between the states produced by this replication 
process. While this style of exposition clarifies the underlying mechanics, it is generally 
more expedient to perform a breadth-first traversal of A u if), adding clock-activity sets to 
its states and splitting states as required. Although the worst-case behaviour of this latter 
augmentation procedure may be as bad as the nai ve method of the description, in general, 
the latter procedure never creates many unreachable replicas. 

For expositional reasons, we allow the transitions of A t (f) to copy the value of a 
clock c\ into a clock C 2 provided that c\ has the form C 2 has the form c Y ^' d , and 

I 2 C* T \. Thus, in addition to clock resetting actions, we allow restricted copying actions. 
This method of description clarifies the underlying reasoning better than a direct encoding 
into a conventional TBA. A slightly unnatural clock-naming scheme would allow us to 
rename the clocks in Atif) while eliminating the copying actions on its transitions. For 
instance, it is easy to see that instead of the copy action C 2 c\, a “shadow clock” C\p 
could be started simultaneously with c\ and used in place of C 2 following the copy action. 
This simple-minded scheme, however, increases the number of clocks quadratically and 
increases the number of states by a factor exponential in the number of clocks. A slightly 
more sophisticated scheme, taking account of properties of interval reductions, allows us 
to encode copy actions without increasing the number of clocks, while keeping the number 
of states essentially the same. Note for this that the clocks form a natural partial order under 
the copying relation. We give details of this construction in the next subsection. 

Note also that clock-activity sets are not mentioned in the definition of TBAs given 
earlier or in the original definition in Alur & Dill (1990). It is easy, however, to modify the 
definition of TBAs and the emptiness algorithm in Alur & Dill (1990). to handle clock- 
activity sets in a straightforward manner; see Dill (1989), for instance, where a similar 
concept is used. 
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Below we formalize the operations of dock activation and deactivation, which we use 
in our construction of the TB A At. 

DEFINITION 19 . c 

[Clock Operations] The transition (s, a s ) (t, a ( ) of A t (f) activates the clock c 
iff one of the following holds 

1. c — Tlen(0, d] € t, X is irreducible in t, and Jlen(0, d] e s only if J is 

reducible in s 

2. c — Cj X ' d , !-> len(0, d] 6 s, 1 len(0, d] e t, and 1 is irreducible in both s and t 

3. c — Cj X ' d & I 2 is irreducible in t, c^' X ' d € a^, Z\ is irreducible in s, I 2 Cj Ii, and 
this reduction is not collapsing 

The transition deactivates clock c iff c — c^ x ' d is in a 5 and 1\ is reducible in t. 

We now define the automaton A t (/). 

DEFINITION 20 

[Timing Augmentation] Let A u (/) be an untimed automaton such as obtained above. 
Then its timing augmentation, denoted A t (f), is the TBA with: 

- State set H(/) x 2 docks </> 

- Input Alphabet 2 sc,( / } 

- Clock Set cIocks(/) 

- Non-deterministic transition function p t defined on (H(/) x 2 clocks C)) x 2 sd ^ such 

that p t allows the transition (s, a^) (t, a t ) iff 

1. s -V t is allowed by p u 

2. a t consists of all clocks that are activated by the transition and all clocks of a^ that 
are not deactivated by the transition 

3. if the transition activates C 2 by the third rule of definition 19, c\ = c^' x,d and 
C 2 = c^ x ' d are the clocks in this rule, and y = >0, then for all c\ = A I,d € a 5 
such that l[ # Ii, it is not the case that I 2 Of 1[ 

4. C contains the reset action “c <— 0” iff the transition activates c by either the first 
or the second rale of definition 19 

5. C contains the copy action “C 2 c\” iff the transition activates C 2 by the third 

rule of definition 19, c\ = c^' x ' d and C 2 = A^ x ' d are the clocks in this rule, and if 
y = a, then for all c[ = c^A d € a* such that I 2 A we have Z\ 1[ 

6. tp contains the clock condition c < d iff c — Cjf’ d e a* 

7. 4> contains the clock condition c — d iff c — Cj X ' d e a* and 1\ has a collapsing 
reduction in t 
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- Initial state set {(s, a s ) } s ,a s such that/ e s and = {cj’ I,d } a ,x,d such that Jlen(0, d\ e 

s and I is s-irreducible 

- Accepting state set H(/) x 2 clocks( ^ 

The intuition behind the augmentation procedure is as follows. Rule 4.3 ensures that 
any model of At(f), when untimed, is accepted by A u (f). Rules 4.3 and 4.3 ensure that 
the appropriate clocks get started whenever there is a new upper- or lower-bound condition 
to verify, and that conditions are remembered until discharged. Rules 4.3 and 4.3 ensure 
that the upper- and lower-bound timers are compared with their prescribed limits when the 
ends of intervals are reached. Rule 4.3 frees up timers for reuse. The condition for a-clocks 
in the last part of that rule states that if there are two running instances of an interval that 
reduce to the same one, the older instance continues to be timed for the upper-bound. Rule 
4.3 guarantees that such a condition will not arise for /-clocks. 

Example 10. Recall example 9, where we illustrated an accepting run of A u (f). Figure 6 
shows the corresponding accepting run of A t (f) on our now familiar M?. The states of 
A t (f) shown in the figure are Sj = (Si, 0), S' 2 = (S 2 , 0), S 3 = (S 3 , {c, c'}), S' A = {S4, 0), 
where c = c“_^)^’ 3 and c' — c^^’ 3 are the clocks of Example 7. The edge labels also 
indicate associated clock conditions and/or clock actions. 

Although the role of clock c is superfluous in the run shown above, in general it may be 
required. 

4.3a Eliminating copying of clocks Notice that the only clause in the transition con¬ 
ditions of At that requires copying of one clock’s value into another is clause 5. In the 
following we describe how such copying actions can be eliminated in order to obtain 
a conventional TBA. First of all, we note that we may rename the timers in clocks(/) 
so that each clock c^ I,d is replaced by a unique clock cJ' J,d , i e {1, • • •, m] where 
m = \ {l' \ l' C* l}\. Further, we associate with each state a tagging function, which 
associates with each clock active in that state an element of { l' \ l! C* T }. The essen¬ 
tial idea is that, instead of copying one clock c value into another d on a transition, we 
simply update the tag function on c in the next state. The tag function, thus, keeps track 
of the remaining suffix of the interval being timed by a clock. This will not work in case 
a transition also resets the active clock, following a copying action, since the old value 
would get “clobbered.” In such a case, (i.e. if the transition also resets the source clock of 
a copy action), we simply pick an inactive clock, with the same superscript (perhaps with 
the lowest subscript among those available) and activate it. It is not difficult to see that in 
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every such case an inactive clock will always be available. This takes care of all cases of 
copying. When a clock’s tag collapses, indicating the end of an interval, it is compared 
with its upper or lower bound as appropriate, and returned to the pool of inactive clocks. 

We need only show that the number of clocks suffice, i.e. there is always a clock of the 
required kind available, when we want to pick an inactive one. But this is clear from the fact 
that for a clock with superscript I there cannot be more than \{X' \ l' \Z* X\\ copies ever 
required, since there will never be more than that many instances of the interval X active 
simultaneously. We have thus eliminated all copying actions while keeping the number of 
clocks the same as before. However, in comparison with our original construction which 
involves copying between clocks, there is an increase in the number of states, because of 
the association of active clock sets and tag functions with states. In fact, the total number 
of states in the resulting TBA is now bounded above by |H(/)| • 2 0(n2k ' k ' ]o & n \ 

4.4 Eventuality automaton 

This is essentially the same as the construction in Ramakrishna et al (1992) to which we 
refer the reader for more details and intuition. 

DEFINITION 21 

[Eventuality Automaton] A e (f) is the BA with 

- Input Alphabet 2 scl( -f ) 

- State Set 2 E< ^\ where E(/) is the subset of scl(/) that contains all formulae of the 
form ->[9 | ->-)false 

- Deterministic transition function p e defined on 2 E( -^ x 2 scl( ^ such that s -4- t satisfies 

1- t = {/i e E(/) n i | fi is i-irreducible} when s = 0 
2. t = G E(/) | /i G s} when s ^ 0 

- Accepting state set {0} 

- Initial state set {0} 

Note, in particular, that A e (J) handles only unbounded liveness conditions. Time- 
bounded liveness conditions are handled by the combination of A e ( f) and A t ( f)\ A e (f ) 
ensures that the required state is eventually reached (without regard to real-time) and 
At(f) ensures that the related timing constraints are met when the state is reached. A 
similar “communication” (via the “input” string) also occurs in the purely untimed case 
of FIL while dealing with eventualities that are bounded within intervals (Ramakrishna 
et al 1992): for checking an eventuality within a bounded context, the local automaton 
checks that the context does not end before the eventuality is found, a pure safety property; 
the eventuality automaton checks that the right end-point of the enclosing context does 
eventually occur, a pure liveness property. 1 

Example 11. In our running example, we have-E(/) = ~‘fn, As in the 

previous two examples, we illustrate the accepting ran of A e (f) on in figure 7. The 
states shown are 0, E x = {-/ 8 , -/ 12 }, and E 2 = {-/n}. 
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S4 S1 S2, S3 



Figure 7. A run of A e for example 11. 


Note how A e (f) is always one step “behind” A u (f): A u (f ) is non-deterxninistic, while 
A e (f) is fully deterministic, allowing precisely one transition on any input. Note also 
that both automata do not cycle, in the terminology of fundamental mode asynchronous 
automata; i.e. on any input stream consisting of precisely one input letter, there is at most 
one state change. 

4.5 Combining the automata 

The decision procedure is now straightforward. We construct A u (/) and augment it using 
the timing construction to obtain A t (f). We then take the product of A t (f ) with the 
eventuality automaton A e (f), where A e is run on the untiming of the input string. Finally, 
we check the emptiness of the resulting timed automaton A m {f), using the emptiness 
algorithm Of Alur & Dill (1990). We thus have our main theorem. 

Theorem 4. [DECISION PROCEDURE] Given an RTFIL formula f, it is decidable 
whether or not f is satisfiable. 

The main lemma required in the proof of theorem 4 is 

Lemma 7. The language of A, n (/) is empty iff f is not satisfiable. 

Proof. The proof follows from the Completeness and Soundness lemmas below. The proofs 
of the two lemmas follow the usual format of playing off the semantics of formulae against 
the allowed runs of the automaton, and are sketched in the next section. 

Lemma 8. [COMPLETENESS] Let f be an RTFIL formula and M a satisfying model for 
it. Then M/ is accepted by A m (f). 

Lemma 9. [SOUNDNESS] Let f be an RTFIL formula, M' a timed string accepted by 
A m {f), and M the restriction of M! to the primitive propositions. Then M \= f. 

The construction for our decision procedure shows, once again, that RTFIL is invariant 
under finite infinitesimal timed stuttering. This was stated and proved directly in theorem 1, 
but is further clarified by noting that the local TB A A t (/) has a reflexive transition relation 
with the self-loops containing edge conditions of the form c < d only and no clock actions. 
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5. Proof of correctness 

We devote the next two sections to proving the Soundness and Completeness lemmas. 

5.1 Completeness 

Throughout this section we assume that is the extension of a satisfying model M for /, 

as stated in the Completeness Lemma. Moreover, we use the timed co-string representation 
for M.f. It is easy to see that admissibility of M? implies that there is a timed co-string 
representation for it. Note that any of the uncountably many representations suffices for 
our purposes. However, for convenience, we use a “canonical” representation, with M/ 
represented by the timed co-string ( 07 , defined inductively as follows (let t-\ = 0 ): 

Vi =M f (ti- 1 ) 

U = inf({{f > ti -1 | M} f {t) ± U {Lfi-iJ + 1}) 

The proof of the Completeness Lemma follows from lemmas 12 and 14. Proofs or these 
lemmas make use of several intermediate lemmas. 

Lemma 10. A u (/) accepts untime (At ^). 

Proof. Observe first that since all states of A u (/) are accepting, we need only show that 
there is an infinite run of A u that consumes (07 );<=*> = untime(Al^). Since each state 07 
is Hintikka, it is a state of A u . By clause 1 in the definition of p u (see definition 18), au 
CAN CONSUME the input symbol 07 iff it is in the state 07 . Thus, if A u has an infinite 
run consuming M* , that run is unique. That it has an infinite run is shown by induction 
on the length of the run. 

Base Case. Since (M, 0) f= /, we have f e M f (0) and, therefore, / € a 0 . Thus cr 0 
is an initial state of A u . 

Inductive Step. We need to show that cr,- + i € p u ( 07 , 07 ). Assume not. Then 07 yf 07 + 1 , 
since p u allows self-loops. This means that fj+i is the least t satisfying t' > r,- and 
Mfifi ) ^ M.f{t'). But then the assumption that the transition o\ —> a I+ i violates one 
of the last four transition requirements of p u . But using the definition of extension, this 
contradicts lemma 4. □ 

Lemma 11. If a timed m-string (oi , f;); go> is accepted by At, then the acceptance run is 
unique. 

Proof. From the definition of the transition function of A t , the BA A u must accept the 
untimed string ( 07 );. From the proof of lemma 10, the run of A u on ( 07 )/ must be unique. 
Recall that the state of At consists of two components: a Hintikka set and a set of active 
clocks. From the above, it is clear that the “Hintikka component” of the run of A t on 
(°j > U )i is unique. What remains to be shown is that the “clock component” is also unique. 

To see that this is indeed the case, we note from definition 19, that the clocks that are 
active in the state following a transition are uniquely determined by the Hintikka component 
of the states adjoining the transition, and the clocks that are active in the state prior to the 
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transition. Moreover, the clocks that are active in the initial state are uniquely determined 
by the Hintikka component of the state. Since the Hintikka component is unique and 
determined, so also is the clock component, and the result follows. □ 

Lemma 12. A t (f) accepts M?. 

Proof. Assume for a contradiction, that it does not. From lemma 10 we know that A u 
accepts untime(At^). If A t rejects, it must be because some clock condition, introduced 
as a result of the timing augmentation is not satisfied along the run. Before we proceed 
with the proof, we introduce some terminology. 

DEFINITION 22 

For a ran of At over an extended model M?, a timer thread c y ' Iid [t a , tf) is a finite chain 
of clocks (cjf' d , • • •, Cjf’ d ) such that 

1 . 1 = X\ 

2 . for all i e {1, • • •, n - 1}, l i+ 1 C* It 

3. a transition at time t\, activates Cj' x,d 

4. there is a strictly monotonically increasing sequence of time values (t\, ■ • ■, f n _i) with 
tb < t\ and t n -i < t e , such that a transition at time f,- copies c^f' d into c^’ d , and no 
transition at any time strictly between f,- and f,+i deactivates Cjf' d 

5. a transition at time t e deactivates di' y ' d 

A timer thread is incomplete if the deactivating transition at t e also copies the last clock 
to another clock. 

A timer thread is useless if the transition at t e deactivates the last clock without copying 
it and the remaining interval X n does not collapse in the state following the transition. 

A timer thread is a complete useful thread if X„ has a collapsing reduction in the state 
following the transition at t e . 

The intuition behind this terminology is as follows. A complete useful timer thread 
represents a successful verification of a timing constraint. A useless thread represents a 
verification that was started but was later abandoned, because the corresponding timing 
constraint was subsumed by another timing constraint whose verification was in progress. 
An incomplete thread represents a verification that is in progress and that may be either 
completed into a successful verification or abandoned in the future. The reader should 
observe that, with the transition function for At defined in definition 20, if y = f, then 
every incomplete thread eventually completes usefully. On the other hand, incomplete 
a-threads may become useless. It is easy to see that each active clock in any run belongs 
to precisely one incomplete thread and, moreover, each incomplete thread correponds to 
precisely one active clock in any state. Also, threads may complete usefully or become 
useless as a ran progresses, but they never fork or merge. Therefore, starting from an active 
clock and tracing back along the thread to which it belongs, one can locate its “ultimate 
ancestor”, or the initial clock created for the verification of a timing constraint. The value 
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of the active clock indicates the time that the thread has been active since its ultimate 
ancestor was activated. 

Proof of lemma 12 Cont’d. We need to show that A t (/) consumes the timed co-string 
{c Ti, ti)i eco representing M/. We show that for all j s co, the TBA A t (J) consumes the 
prefix (oi, in a run r J that ends in the state (cry, a j), where a j consists of the 

clocks that terminate the incomplete timer threads induced by xi . 

The base case of j — 0 follows immediately from the definition of initial states of A t . 

For the induction assume that the above holds for j. 

<jj , C ,<p 

We first show that p t allows the transition (oj , a j) —> ( a j+i , a ) where a consists of 
all clocks activated by the transition and all clocks of a j not deactivated by the transition, 
C satisfies clauses 4.3 and 4.3 of definition 20 for the timing augmentation, and </> satisfies 
clauses 4.3 and 4.3. Because of lemmas 10 and 11 all we need show is that adoes 
not contain a pair of active /?-clocks representing two distinct incomplete timer threads 
cP ,x,d [ti(i), tj) and cP' I ' ci [ti( 2 ), tj), with f,(i) 96 r, (2 ), which merge at tj. The starting 
of a /1-clock at f,(i) implies from definition 19, clause 2, that X—> len(0, d] e ct,-( i) and 
Tlen(0, d\e The semantics then imply that tj +jc = t/(i) +d, where x represents 

the common suffix that will be timed by the thread following the merge at tj. Arguing 
similarly for the case of the second clock, we have tj + x — f/( 2 ) -F d, thus together 
contradicting the assumption that r,(i) f r, ( 2 ). 

Next we show that the run r y+1 of A t (f) obtained by extending t-> by the above 
transition consumes (a,-, • For this we need only show that the timing conditions 

required by clauses 6 and 7 of definition 20 are not violated. For the case of clause 6 , 
consider an active a-dock in a j representing the timer-thread c a ’ x,d [ti ,tj). But the starting 
of an a-dock at t t implies from definition 19, clause 1, that Xlen(0, d] e Oi- hi- The 
semantics then tell us that tj < ti + d. The value of the clock, tj — ti cannot then exceed 
d. The case of clause 7 is similar. 

Finally, we note that a' consists of the clocks terminating the incomplete timer threads 
induced by the run r- /+1 . But this follows immediately from definition 22 for timer threads 
and the definition 19 for the clock operations. □ 

For the proof of lemma 14, the following simpler lemma is useful. 

Lemma 13. Let r u and x e represent, respectively, the runs ofA u (f ) and A e (f) on some 
co-string a € ( 2 sc h/))®. Then for all i e co, xf C r“. 

Proof. We first make the following observation about the statement of the lemma. As we 
have seen, on any arbitrary co-string on which A u has a mn, it has a unique run. Moreover, 
as we show below in the proof of the next lemma, A e has a unique run on an arbitrary 
input. Thus the runs x u and x e are unique. 

The proof of the lemma follows by induction on the index i of the mn, as follows: 
BASE Case. Every Hintikka set has some element, thus Xq ^ 0; but Xq = 0. 
Inductive Step. Assume that x% c t“. We consider two cases. 

Case 1 ft/' = 0 ]. First note that a Hintikka set contains its basis, since it is closed under 
reductions, and every Hintikka set contains true f E(/). From the transition conditions 
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of A e , therefore, r * +1 is a subset of the nth input which is r“. Further, by the transition 
conditions of A u , t “ +1 contains all irreducible formula?, in r“, so that r * +1 C r" +1 . 

Case 2 [r* # 0]. Since r* +] contains the irreducible subset of r,f and r “ +1 contains all 
irreducible formulae of r“, using the induction hypothesis, we have x^ +x C r“ +1 . □ 

Lemma 14. A e (f) accepts untime(Al^). 

Proo/ Observe, first, that A e (/) is deterministic, and in every state has a (unique) transition 
for every input letter from 2 scl( A Thus, there is a unique infinite run of A e on a = 
(Gi)ieco = untime) that consumes cr; call this run x e . Since Xq = 0, if x e is not 
accepting, then there is a largest i such that xf = 0 . 

From the second transition rule for A e in definition 21 and the definition of reduction, we 
can conclude for any two consecutive states s ^ 0 and t ^ s of A e such that p e ( s, i) = t, 
thatsize(t) < size(s). 9 By the well-foundedness of size, there is some j > i , such that for 
all k > j , o/c = oj ^ 0. Thus there is some formula ->[0 i —>)false € or* for all k > j. 
Without loss of generality assume that 0 is — >a, 6'. 

- By the definitions of p e and reducibility, then, a ^ o> for all k > j. Completeness of a 
implies that e crk for all k > j, whence the definition of an extension and semantics 
yield {M, tj) \= [-^-a,d' \ -^)false. 

But from the proof of lemma 10 we have x£ = M J ‘ fa), where x u = (cr“)ieco denotes 
the infinite run of A u on a. By lemma 13, c t“, so -<[-»■ a , 6 \ — >-)false e M^fa). 
By the definition of extension and semantics, then (M, tk) [->•«, 0' \ -^)false, a 
contradiction. □ 

5.2 Soundness 

The proof consists of showing that given a (timed) string in the language of A m , one can 
construct a satisfying model for /. Let (cr,, f;); ect) be a string in the language of A m , and 
let M! be defined by 

M\t) = {/i e scl(/) | /i e at, t e i, n )} 
where we have assumed t-\ =0. Moreover, let M be defined by 
M(t) = { P eP | p € M'(t) } 

To prove the lemma, we want to show that (M, 0) |= /. 

Lemma 15. For any t € R and f\ € scl(/), /] e M'(t) iff (M, t) |= f\. 

Proof. For a given t, we induct on the inclusion order induced by scl on the formulae in 
scl(/). Let t e fa- 1 , t{) as defined above. 

Base Case. Consider a primitive proposition p € V. For the forwards direction, let 
p e M'(t), so p e M(t), whence the semantics give us (M, t) (= p. For the backwards 
direction, let (M, t) p, so p € M(t), so p e M'(t), by construction. 


®For a set of formulae F, let size(F) = size(/). 
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Inductive Step. Assume that the lemma holds for all/' e scl(/i) where scl(/') c 
scl(/i). We can show then by a case analysis of the structure of the f \, that the lemma then 
holds for f\ also. The details are routine and extremely tedious and are therefore skipped. 
However, we illustrate below a sample case to illustrate the argument. 

Consider the case of f\ = [ 6 \ | 82 ) f 2 . Assume for the forwards direction that f\ e 
We have two subcases depending on whether or not f\ is .M'(t)-reducible. 

Subcase 1 [reducible]. By our construction, f\ is 07 -reducible. Let f[ e 07 and 
F c 07 be such that f[ <* F f \. Then since 07 is Hintikka, f[ € 07 , so by constmction 
/j e M'(t) as well as F c Now as the subformula closure of all the reductors 

and reducts of a formula / are strictly contained in scl(/), the induction hypothesis and 
Corollary 1 give us the result. 

Subcase 2 [irreducible]. By construction, f\ e 07 is 07 -irreducible. By our 
earlier observations, since a = (07)7 eco is accepted by A u (/), we may consider a to be the 
ran of on a. By the transition conditions of A u , f\ € crj for all j,i < j <k where k is 
the least index greater than i such that f\ is 0 ^ -reducible. To see that such a finite k must 
exist, use the acceptance criteria for A e along with the fact that both -> [#1 | -a- ) fz G 07 and 
->[02 I -»)/2 c 07, since f\ is a;-irreducible. At index k, we use an argument identical to 
Subcase 1 above to establish that (M, [= f \. Using the fact that f\ is irreducible in 

the intervening period, allows us to use the induction hypothesis on red(/i) and (k — 1 — i) 
applications of lemma 4 to conclude that (M, t) \= fi- 

The backwards direction is similar. For some more details, we refer the reader to 
Ramakrishna (1993). □ 

The soundness lemma follows since / is in M? (0). 

6 . Complexity 

Let / be an RTFIL formula of size n and depth k, and let T be the size of the encoding 
of largest finite timing constant appearing in /. By lemma 5, |scl(/)| = 0(n k ). Clearly, 
A u (f ) and A e (f) can have at most 2 0(n states each. The timing augmentation can 
introduce up to 0(n 2k ) clocks. Following the elimination of copying actions, thus, At (/) 
(and consequently also A m (f )) can have at most 2°(" 2 i ’*' lo S'9 states and 0{n lk ) clocks. 
The final emptiness check has a complexity of O (C! • (S + £ , ) 2 7MogT ), where C is the size 
of the clock-set, 5 and E are the number of states and edges in the TBA, and T is the size of 
the binary encoding of the constants appearing on the edge conditions of the TBA (Alur & 
Dill 1990). The overall complexity of the decision procedure is thus 2 0( - n2k ' 2 k ' io z n+Tio & . 

The main source of the blow-up is due to the large number of clocks. Note, however, that 
usually the number of clocks will be much less than that indicated by the large upper-bound 
because timing conditions in specifications will generally involve relations between a few 
simple predicates rather than long sequences of events. As a result the overall complexity 
will be closer to 2 0 ( nk+c ' klo % n+T ' lo % T \ where C is the number of clocks introduced in 
the timing augmentation. Comparing this with the 2 0(jl ^ upper-bound for FTL, the price 
for real-time is seen to be an additional factor exponential in the number of timers and the 
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constants appearing in the specification. However, the decision procedure is still doubly 
exponential (deterministic time), essentially the same as for the timeless logic FIL. In fact, 
by combining the PSPACf-containment of the emptiness problem for TBAs (Alur & Dill 
1990) with the £XP5PAC£-encoding of the automaton constmcted in the last section, it 
can be shown that RTFIL is in EXPSPACE. 10 

The procedure given can be adapted in a straightforward manner to obtain a model¬ 
checking algorithm for RTFIL having the same complexity with respect to input formula 
and linear in the size of the input model (for instance, in the form of a fair-transition 
system). 

Analogous to the result in Ramakrishna et al (1992) we can show that if we bound 
the largest constant appearing in a formula and the largest depth of nesting of interval 
modalities, then this bounded version of satisfiability for RTFIL is PSPACE- complete in 
the size of the formula. This result is more indicative of the type of scaling behaviour one 
would expect for the logic. 


7. Related work 

The idea of bounding the duration of intervals was first articulated by Melliar-Smith in 
an early paper on real-time interval logic (Melliar-Smith 1987). Subsequent proposals for 
real-time interval logics appear in Narayana & Aaby (1988) and Razouk & Gorlick (1989). 
However, none of these proposals provided decision procedures for the logics presented. 
In fact the logic of Razouk & Gorlick (1989) is so powerful that it is highly undecidable. 
The logics of Narayana & Aaby (1988) and of Melliar-Smith (1987) allow the expression 
of the forbidden “punctuality” construct of Alur et al (1991), so that they can be shown to 
be undecidable if interpreted over a dense time domain. 

Consider an extension of RTFIL by allowing searches of the form + d for d e Q. 
The semantics of such a search is that it locates a point t' in the future of the point t where 
the search began such that t' = t + d. It thus allows relatively natural expression of many 
real-time constructs. However, it is not difficult to show that this simple extension (with no 
other restrictions) makes the resulting logic undecidable (Ramakrishna 1993). The proofs 
of undecidability of all these logics follow essentially along the lines of Alur et al (1991), 
by reduction from the halting problem for two-counter Minsky machines. 

Another possible extension is to consider backwards searches, for instance <-f. We 
have shown that even in the absence of real time, this construct leads to non-elementariness 
(decidability of the logic with backwards searches, but without real-time, follows by trans¬ 
lation to SIS). The proof of non-elementariness (Ramakrishna 1993) is by reduction from 
the non-emptiness of complement problem for extended star-free regular expressions. 

Decidable dense real-time logics are relatively rare because a dense real-time logic must 
tread a fine line between expressiveness and undecidability. The logics RTFIL and MITL 
(Alur etal 1991) adopt different compromises, and neither, we believe, is as expressive 
as the other. MITL appears to have no direct way of expressing RTFIL formulae that 


10 However, the best lower-bound we have is the PSPACE lower-bound for FIL. We refer the reader to Ramakrishna 
(1993) for related comments. 
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constrain the length of an interval defined between the endpoints of a sequence of (more 
than two) searches. Correspondingly, RTFIL cannot express the MITL construct pUjq, 
which requires q to occur within the time bounds denoted by I (while not constraining its 
occurrence outside that interval), and p to hold until that occurrence. 11 

In effect, RTFIL defines events in relation to other events, and then imposes real-time 
constraints on their relative occurrence. In contrast, MITL first defines real-time inter¬ 
vals and then requires events within those intervals, possibly in relation to other events. 
Thus, it appears that MITL may be better suited for synchronized real-time systems, where 
the synchronization is by real-time, whereas RTFIL may be more appropriate for asyn¬ 
chronous real-time systems. A natural question, then, is whether there is a reasonable 
combination of the two logics that retains decidability. We conjecture that the answer is 
in the affirmative, and a decision procedure for the combination would follow from a suit¬ 
able “composition” of the procedures for the two logics. This is the case, for instance, for 
FIL and PTL(<S, U), where such a “combined” decision procedure follows from purely 
automata-theoretic methods (Ramakrishna 1993). 

The Duration Calculus (Chaochen etal 1991) differs from RTFIL in that it treats intervals 
as primitive. It is well-suited to describing and reasoning about cumulative behaviour, a 
feature especially useful for hybrid systems. The operator / in that logic, for instance, 
allows one to bound the duration of a (fragment of a) computation for which a predicate 
holds. This ability to integrate over non-convex intervals, combined with the “non-local” 
character of the logic makes it very expressive. However, as shown in Chaochen et al 
(1993), over dense time the simplest real-time fragment of the calculus is undecidable, 
and even without real-time the simplest fragment is non-elementary. We are currently 
investigating an extension of RTFIL with ageing operators, inspired by the / operator of 
the Duration Calculus. 


8. Conclusion 

We have presented a real-time interval logic RTFIL which conservatively extends the 
timeless logic FIL. The logic extends FIL in a natural way to allow real-time specification, 
without sacrificing decidability. We have presented a formal semantics for the logic and 
have given a decision procedure for it. That RTFIL involves an additional exponential factor 
proportional to the number of clocks and the constants appearing in the specification should 
come as no surprise for those familiar with other dense-time logics. 

A prototype RTFIL theorem-prover based on a tableau-theoretic analogue of the decision 
procedure given in this paper has been implemented and used to verify some simple real¬ 
time systems. However, further work is required before the system can become the basis 
of a practical verification system for real-life examples. Apart from the use of efficient 
data-structures, such as binary decision diagrams for state-encoding, efficient heuristics. 


11 In each case, the introduction of auxiliary predicates mitigates the problem. Note also that the logic TPTL (Alur & 
Henzinger 1989), with “freeze” quantification, can express the RTFIL property given earlier. Unfortunately, TPTL 
is undecidable when interpreted over a dense time domain. We must add, however, that MITL extended with past 
operators can express this property, although apparently less succinctly (see Ramakrishna 1993). This logic has a 
decidable validity problem. 
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such as those used in Alur et al (1992) will need to be used in order to reduce the space 
requirements for the verification. Since our procedure is automata-theoretic, it can directly 
benefit from any advances in verification technology based on cu-automata. 

We are also devising a proof calculus for the logic in the style of the natural deduction 
calculi that are now gaining popularity in many applications. The success or failure of 
an “expensive” logic such as RTFIL would depend crucially upon whether one is able to 
obtain a clean proof system. We consider our decision procedure an important first step 
in this direction. For instance, our reduction and transition rules can be seen as a form of 
“rewrite rules” for a tableau proof system. The incorporation of timers in a formal manner 
into such tableaux, however, presents non-trivial difficulties. One approach might be to use 
time variables with such operations as resetting, assignment, comparison and difference, 
to simulate the role of timers. However, such an approach is probably far too low-level 
to be useful. On the other hand, some appropriate mixture of automated inference within 
such a proof system, along with user assistance at crucial points, may be feasible. 

Finally, from a more theoretical standpoint, there are interesting expressiveness ques¬ 
tions regarding RTFIL and some other decidable real-time logics. The apparent duality 
between our approach and that of other real-time temporal logics, as outlined in the previ¬ 
ous section, clearly merits further study. Another interesting direction involves identifying 
a natural decidable fragment of parametric RTFIL, in the sense of Alur et al (1993). 


We thank Rajeev Alur for useful discussions, and for helpful comments on the conference 
version of the paper. 
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Abstract. In this paper, we use perfectly synchronous languages such as Es¬ 
terel, for modelling Futurebus arbitration protocol. We show that the perfect 
synchrony aids in the formalization, testing, validating and verifying the proto¬ 
col. We discuss solutions to the above protocol and show that properties such as 
mutual exclusion and deadlock-freedom can be established formally. Further, 
we show how the simulators can be used for testing and validation and can verify 
an instantiation of the protocol through algebraic tools such as auto/autograph. 

Keywords. Synchronous languages; futurebus arbitration protocol; Esterel. 

1. Introduction 

The concurrency and reactive behaviour intrinsic in the futurebus protocol makes it natural 

to think of modelling the protocol in the following classes of languages: 

1. Asynchronous languages: Here, a program is treated as a set of loosely coupled indepen¬ 
dent execution units or processes, each process evolving at its own pace. Interprocess 
communication is done by mechanisms such as message passing. Communication as a 
whole is asynchronous in the sense that an arbitrary amount of time can pass between 
the desire of communication and its actual completion. This class includes languages 
such as Ada, Occam, CSP etc. 

2. Perfectly synchronous languages: In this class, programs react instantaneously to its 
inputs by producing the required outputs. Statements evolve in a tightly coupled input- 
driven way deterministically and communication is done by instantaneous broadcast 
where the receiver receives a message exactly at the time it is sent. That is, a perfectly 
synchronous program produces its outputs from its input with no observable time delay. 
Languages such as Esterel (Berry & Gonthier 1992; Berry 1992), Lustre (Halbwachs 
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et al 1991), Signal (Le Guemic et al 1991), Statecharts (Harel 1987) belong to this 
category. 

Thus, one can have two points of view on this protocol. The first one corresponds to 
asynchrony. Each module runs independently and all interleaving of actions are valid. 
Analysis is difficult and it is not clear at all that the protocol does what it is intended to 
do. The second point of view corresponds to the synchronous approach: there are common 
instants shared by all processes. This limits possible action interleavings and for example 
forbids one module to continuously progress although others remain blocked. 

The synchronous approach is rather natural in an hardware implementation of the proto¬ 
col. In this context, global instants exist and are defined by reference to the basic clock. We 
can expect behaviours to be reproducible which certainly simplifies the debugging process. 
On the other hand, the synchronous hypothesis is unrealistic in software implementations 
where modules can be, for example, distributed through a network. In this case, (almost) 
all interleavings are possible and there is no hope to remain in the purely deterministic 
case. 

In this paper, we show how perfectly synchronous languages such as Esterel can be 
used for formal modelling, testing, validation and verification of the Futurebus protocol. 
We develop solutions in synchronous and asynchronous frameworks and discuss their 
advantages and difficulties. In particular, 

1. First we describe a solution in Esterel wherein 

• a module which is a Master Elect captures the bus and releases it immediately (so 
that, there is no clear distinction between the Master and the Master Elect). 

• a module can enter into competition at any point. 

Later, we refine the solution to have a distinction between the Master Elect and the 
Master. After showing the correspondence of the solution with the specification of the 
protocol, we show that the solution satisfies the properties of mutual exclusion and 
deadlock freedom. However, the solution suffers from the possibility of a livelock due 
to the preemption specification in the protocol. 

2. Then, we describe an asynchronous solution using another reactive language RC 
(Boussinot 1991) in a distributed setup. Here, we discuss the need of restricting the arbi¬ 
tration numbers to establish the properties of mutual exclusion and deadlock-freedom. 
We also show how the testing and formal correctness at least for an instantiation of the 
protocol helps in gaining the confidence of the solution. For the latter purpose (that is 
for formal correctness), we use the auto/autograph (Roy & de Simone 1990). 

2. Futurebus arbitration protocols 

Futurebus+ (IEEE 1991), is a set of tools with which to implement a bus architecture 
providing performance scalability over both cost and time for multiple generations of single 
and multiple-bus multiprocessor systems. The specification of arbitration specification 
plays a crucial role in the performance of the system. In the following, we describe the 
arbitration process as in IEEE (1991). 
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2. 1 Arbitration process 

1. When a module needs to send data to, or obtain data from another module, it must first 
gain tenure of the bus. Since two or more modules may seek to gain tenure of the bus at 
the same time, the arbitration process is used to restrict tenure of the bus to one module 
at a time. Since the arbitration process operates in parallel with, and independent of the 
data transfer process, an arbitration competition for control of the bus may take place 
concurrently with transactions on the parallel highway (main bus). 

2. Each module is assigned a unique, arbitration number , which is used to resolve arbi¬ 
tration competition. When two or more modules compete for the bus, the winner is the 
one whose arbitration number is the largest. Parallel contention arbitration is a process 
whereby modules assert their unique arbitration number on the arbitration bus and 
release signals according to an algorithm which after a period of time will ensure that 
only the winner’s number remains on the arbitration bus. 

3. The value of the arbitration numbers used by modules in a competition determines 
the sequence of arbitration process. Because of limitations in the number of bus lines 
used for arbitration, some numbers require two passes (or more - see the step below) 
of the control cycle. The module with the highest arbitration number at the end of the 
arbitration competition is referred to as the Master Elect. A master elect can take the 
bus when the module using it releases it. On taking the bus, it is referred to as the Master 
(the competition for the bus can begin after the master elect becomes the master). 

4. There may be times when a module has urgent need of the bus after a master elect has 
been chosen, but before the master elect has become master. If that module has a higher 
arbitration number than the master elect, it may initiate a new competition to establish 
a new master elect. This process is referred to as preemption. Preemption allows a 
high priority module to acquire tenure of the bus with minimum latency (although with 
some sacrifice to the overall performance of the system since it forces a new arbitration 
competition). 

In the following, we describe the abstraction of the protocol with an algorithm for solving 
contentions for the bus. The abstraction integrates the preemption mode described above. 

2.2 Abstraction of the protocol 

There are N modules running in parallel and sharing a bus made of P lines. Each module is 
connected to the bus by, say P lines. Each module has its own arbitration binary number of 
length equal to the number of lines P ( a priori given). The problem is to design a protocol 
such that each module can get a mutually exclusive access to the bus. The operations that 
can be performed by the module and the bus are given below. 

Each module can perform the following operations: 

• Whenever the module wants to get an access to the bus, it places its number on the 
lines. 

• After placing its value, it can read the value on the bus. 

• If its arbitration number is equal to the number on the bus then it gets the bus. 
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modO modi mod2 



Figure 1. The system. 


• If its arbitration number is not equal to it then it performs the following operation: 

Let the arbitration number of the module be: m\, m 2 , ■ ■ ■, mp and the number 01 
the bus be b\ , &2. • • •, bp . Let k be the first digit from left-to-right such that bt > m* 
Then, the module puts a number corresponding to m \, • ■ ■, m-k-i, 0, • • •, 0 on the line 
and waits indefinitely until the kth bit becomes zero and then reenters the competition 

• Each module can use the bus for a finite amount of time. 

The bus performs the following operations: 

• The value on the bus is equal to the bitwise “or” of the values put on the lines from al 
the modules. 

Now, the design of the protocol can be described as follows: 

Given that the arbitration numbers assigned to the modules are distinct and are not 
composed of just 1 ’s or just 0 ’s, design a protocol such that access to the bus is done 
in a mutually exclusive way and there is no deadlock. 

To fix the ideas, consider the case where N equals 3 and P equals 4. This is shown i 
figure 1. The 3 modules are named modO, modi and mod2 and their arbitration number 
are respectively 1100, 0 010, 1001. The order is the natural order 0010 < 1001 • 
1100, so modi has the lowest priority and modO has the highest. 

The behaviour of the protocol is briefed below: 

1. Place the arbitration number on the lines. 

2. Read bus lines in order and compare them with the arbitration number. Because of th 
“or” function implemented by the bus, there is no possibility of a module reading a 
on a line if it has previously put a 1 on it. 
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Now, when a module reads a 1 corresponding to a 0 in its arbitration number, it stops 
to compete for a while and does not continue to read the subsequent lines. Instead, it 
places Q’s on all the successive lines to let others the possibility to win and remains 
blocked reading the same line until a 0 appears on the line on which itls blocked. This 
is done to avoid the blocking (or the deadlock). 

3. After the module places its number on the last line without getting stuck, it tests whether 
its number and the number on the bus are the same. If they are the same then the module 
knows that it is the winner. Otherwise, it is inferred that another module has already won 
over it and hence, waits on the line corresponding to the position of the most significant 
digit position which is less than that on the corresponding bus line. To see that others 
are not blocked because of the possible l’s in the least significant digit positions after 
that position, it places a zero on all the successive lines. 

3. Synchronous solution 

In this section, we describe a solution in Esterel and discuss its properties. An informal 
briefing of the kernel statements are given in appendix A; for details the reader may refer 
to Berry (1992) and Berry & Gonthier (1992). 

3.1 Solution in Esterel 

A solution in Esterel with N = 4 and P = 3 is described below; the generalization to 
any N and P follows naturally. The solution has the following structure: Each module has 
two phases: reading and checking. In the reading phase, the module places its number on 
the lines; in the checking phase, it checks whether its number and the number on the bus 
are the same. If so, it captures the bus and releases it; otherwise, it awaits for the higher 
priority processes to finish and lets other to win over it by placing 0’s from the point it is 
stuck. The actual solution in Esterel is given below: 

module Celll: 
input compete; 
output next, raise; 

every immediate compete do 

emit next; emit raise; 

end. 

module CellO: 
input compete, raised; 
output next; 
inputoutput stuck; 
every immediate compete do 
do 

sustain stuck 

watching immediate [not raised]; 
emit next 


end. 
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module line: 
input raise; 
output raised; 
loop 

present raise then await tick; emit raised 
else await tick 

end; 


end. 

module start__result: 
input start, result; 
inputoutput stuck; 
output go, success; 

signal lstuck in 

loop 

await start;emit lstuck; 
trap term in 
[sustain go 

I| await immediate [result and not lstuck]; 
emit success;exit term 

I I every immediate stuck do await tick;emit lstuck end 

] ; 

end; 

end; 

end. 

module FUTUREBUS: 
input start1,start2,start3; 
output successl,success2,success3 ; 
inputoutput stuckl,stuck2,stuck3 ; 
output raisel,raise2,raise3,raise4; 

signal raisedl,raised2 , raised3,raised4, 
resultl, result2, result3 in 

signal passl, pass2, pass3, go in 

run Cel11 [signal go/compete,passi/next,raisel/raise] 
II run CellO[signal passl/compete,pass2/next, 

raised2/raised,stuckl/stuck] 
II run Celll [signal pass2/compete,pass3/next, 

raise3/raise] 

II run Celll [signal pass3/compete, resultl/next, 

raise4/raise] 
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II run start_result[signal startl/start, 

success1/success,result1/result, stuckl/stuck] 

end signal 
I I 

signal passl, pass2, pass3, go in 

run CellO[signal go/compete,passl/next, 

raisedl/raised,stuck2/stuck] 

II run Celll[signal passl/compete,pass2/next, 

raise2/raise] 

II run Celll[signal pass2/compete,pass3/next, 

raise3/raise] 

II run CellO[signal pass3/compete, 

result2/next,raised4/raised,stuck2/stuck] 

II run start_result[signal start2/start, 

success2/success,result2/result,stuck2/stuck] 

end signal 

I I 

signal passl, pass2, pass3, go in 

run Celll[signal go/compete,passl/next,raisel/raise] 

II run CellO[signal passl/compete,pass2/next, 

raised2/raised,stuck3/stuck] 

II run CellO[signal pass2/compete, 

pass3/next,raised3/raised,stuck3/stuck] 

II run CellO[signal pass3/compete, 

result3/next,raised4/raised,stuck3/stuck] 

II run start_result[signal start3/start, 

success3/success,result3/result,stuck3/stuck] 

end signal 

II run line[signal raisedl/raised,raisel/raise] 

II run line[signal raised2/raised,raise2/raise] 

II run line[signal raised3/raised,raise3/raise] 

II run line[signal raised4/raised,raise4/raise] 
end. 

An informal description of the program is given follows: 

1. Each process is defined as a parallel composition of four cells (modules cel 10 or 
celll which carry zero and one respectively) and an interface process modelled as an 
Esterel module start-result. The module start-result maintains the start 
of modules and the success propagation. The bus is modelled as a parallel composition 
of four lines. 

2. On receiving a compete input signal, the process initiates the competition; the first cell 
triggers the action of placing the numbers from the most significant position onwards 
till the complete number is put on the arbitration bus; all the processes wanting to 
take the bus do the same thing. This corresponds to the registering phase as per the 
specification. The reading is initiated by a compete and the next cell is triggered 
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by the next signal after placing raise on the bus line if it is the cell carrying one. 
The completion of the registering phase is signalled by the respective result signal 
and the checking phase starts. In the checking phase, the process proceeds as in the 
registering phase till its value and the bus value are not the same; on each “1” it sustains 
the raise. Once it finds that some other process has put a “1” and its own value is 
zero then it gets stuck; as the l’s in the subsequent positions are not sustained there is 
no explicit need for placing 0’s. At this time it generates a stuck signal. 

3. A process that is stuck can reenter registration phase in the next step when the line 
on which it is waiting becomes zero (i.e., not (raised)). Then the reading (for 
the subsequent bits) and the checking (for the whole number) repeats. If the process 
completes checking the line twice without getting stuck consecutively then it knows 
that it is the largest and takes the bus and releases it. 


3.2 Properties of the solution 

In this section, we argue that the solution satisfies the properties of mutual exclusion 
and deadlock-freeness. We assume that the arbitration numbers are distinct and are not 
composed of just l’s or just 0’s. 

PROPOSITION 1 

The above solution guarantees mutual exclusion. 

Proof. As the numbers are distinct, it can be easily seen that there is a total order on the 
arbitration numbers. Now, we can prove mutual exclusion of the access as follows: let us 
assume that at some point of time a nonempty subset of modules enter the competition to 
capture the bus. If the subset is a singleton set, it is obvious that the module places the 
number and tests its identity and captures the bus. Now, consider a subset consisting of 
more than 2 modules. Since, the arbitration numbers cannot be composed of only 1 ’s at 
the end of the first round all the modules will be stuck. All the processes withdraw from 
the competition by setting 0’s from the most significant digit, say k, at which it is less 
than the kth bus digit. The processes can enter into competition only after the position at 
which they are waiting becomes zero. Because of the total order and the fact that all the 
blocked processes put zeros from the position of their being blocked, it should be obvious 
that it is only the modules with the highest number up to the kth position that can enter 
into competition again. It may be noted that there can be more than one module that can 
enter into competition. Further, it can be easily observed that if two modules get stuck at 
the kth position, then their prefix up to this digit must be the same exactly. Hence, if there 
are two modules stuck at the last digit, it means they must be identical - which contradicts 
the assumption of being distinct numbers. In the next round, in addition to these modules 
wanting to reenter the competition, there can be new modules entering into competition. 
In this round, the modules which are less than those that are blocked so far (up to the kth 
digit) will get blocked by the existing modules and those that are greater can progress. 
After reentering the competition, the blocked processes only put their numbers from the 
k + 1 position to the nth position. After the modules have put their numbers they check 
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for their complete arbitration number against the number on the bus. As the numbers are 
distinct there can be only one winner and hence, the mutual exclusion. 

PROPOSITION 2 

The solution is deadlock free. 

Proof From the proof of mutual exclusion, we can conclude that it is the module having 
the highest number that wins the competition. Since the numbers form a total order, it 
is obvious that a winner will be chosen in at most N instants (each digit can block one 
instant). Hence, the proof follows. 

PROPOSITION 3 

The solution is not livelock free assuming there are at least three modules. 

Proof. Consider three modules Mi, M 2 , M 3 having arbitration numbers 


1«2 • • • a n , I &2 • ■ • b n and 0 c 2 • • • c n 

respectively and let us assume that the numbers are distinct are not composed of just 1 ’s 
or 0’s as discussed earlier. Consider the following scenario: 

Mi is already using the bus and M 3 is waiting on the first bit. Then, before Mi 
finishes M 2 can get into competition and overshoot M 3 ; now Mi and M 2 can 
conspire to see that M 3 never gets its turn. 

The solution is not livelock-free follows from the above scenario. 

Note: In the case when there are only two modules, assuming that the releasing process 
gives way to the process waiting on it, there is no possibility of a livelock. 

3.3 Solution with Explicit Master 

By introducing a sentinel value on the bus and another registering phase, we can obtain 
a solution where one distinguishes the master and the master elect (i.e., the one which 
becomes the master as soon as the current master relinquishes the resource). The solution 
is given in the following. It must be noted that in the context of preemption there is no 
clear identity of Master Elect as it can be changed before the master releases the main bus. 

module Celll: 
input compete,grab; 
output next, raise; 

every immediate compete do emit next; emit raise; end 
I I 

every immediate grab do emit raise end. 
module CellO: 

input compete, raised,grab; 
output next,raise; 
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end signal 
I | run line[signal 
I | run line[signal 
| | run line[signal 
| | run line[signal 


result3/result,stuck3/stuck,grab3/grab] 

raisedl/raised,raisel/raise] 
raised2/raised,raise2/raise] 
raised3/raised,raise3/raise] 
raised4/raised,raise4/raise] 


end. 


4. Asynchronous solution 

The synchrony hypothesis of the synchronous languages leads to the assumption that 
whenever the modules (or processes) are ready to take an action (read/emitting) they will 
do it in the same instant. However, such an assumption does not hold in the asynchronous 
solution. In other words, we cannot guarantee that the processes enabled on some actions 
will really do it at that time. In other words, semicolon operator cannot be just compiled 
away, it may correspond to different delays corresponding to different implementations. 

First, we describe a high-level solution in terms of C-code. Modules correspond to the 
C-functions shown in figure 2. 

The following questions will naturally arise in the reader of the above protocol; 

1. What is the need of TryTakeBus? Why not take the bus immediately? 

2. What is the need of resetting lines after the competition ? 

3. What is the need for a global test of the bus ? Is it possible to implement the same as a 
loop that tests for fines one after the other ? 

4. Is a fairness condition assumed somewhere? 

5. Are all arbitration numbers allowed ? 

6 . What confidence can we have in such a program ? Can we test or verify it ? 

We discuss possible answers to the above questions below. 

Question 1. Modules can compete in accessing the bus at any time. Suppose modi reads 
the last fine with the good value say, 0. This means that each time it has read a fine, no 
other module with higher priority has already changed this line. This does not mean that 
there is no module with higher priority competing for the bus ! This could be the case if, 
say, modO enters the competition at the last moment. The global testing of the bus rejects 
this situation. 

Question 2. Resetting fines after working allows others modules with lower priorities to 
take the bus. 

Question 3. The answer to the first question shows that the test of the bus must be global, 
that is atomic. There is no possibility in the general case, to avoid the use of a global test 
while respecting priorities. 


Validation & analysis of the juturebus arbitration protocol 


void module(ident,arbitnum) 

int ident; 

int arbitnum[]; 

{ 

int val,Ok,rank,idcode,j; 

idcode = Code(arbitnum); 
for(;;) { 

/* Put my arbitration number */ 
for(j=0;j<HOWMANYLINES;j++){ 

if (arbitnum[j]) SetLine(j,ident); 

} 

/* Test for my identity */ 
for(rank=0;rank<HOWMANYLINES;rank++){ 
if (arbitnum[rank]==0){ 
for(;;){ 

val = ReadLine(rank,ident); 
if (val==l){ 

/* Let others continue */ 

for(j =rank+l;j <HOWMANYLINES;j + +){ 

if (arbitnum[j]) ResetLine(j,ident) 

} 

}else break; 

} 

} 

} 

/* Try to take the bus */ 

Ok = TryTakeBus(Code(arbitnum),ident); 

if(Ok){ 

Work(ident); 

for(j =0;j<HOWMANYLINES;j++){ 

if (arbitnum[j]) ResetLine(j,ident); 

} 

break; 

} 

} 

} 


Figure 2. The protocol in C. 
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Question 4. There are possibilities of “livelocks”: consider for example two modules mO 
and ml with arbitration numbers 01 and 10. Consider the following possible interleaving: 

• mO puts 0; mO puts 1; mO reads 0; 

• ml puts 1; 

• mO reads 1; mO tries to take the bus and fails (the value of the bus is 11). 

• ml puts 0; ml tries to take the bus and fails (the value of the bus is 11). 

This behaviour can be done forever, producing a livelock. 

Questions. A quick analysis shows that not all arbitration numbers are allowed. For 
example consider two modules mO and ml with arbitration numbers 0 and 1. The following 
interleaving is possible: mO puts a 0, reads the bus and sees a 0. Thus it can begin to work. 
But now, ml puts a 1 and reads a 1, so it can also begin to work. The access to the bus is 
thus incorrect as two modules are working at the same time. 

Question 6. Obviously, answers to the previous questions imply that we cannot have any 
confidence in the protocol as stated presently. 

To face this problem, we are going to: 

• Characterize the arbitration numbers allowed. 

• Test the protocol by executing the code both in an asynchronous and in a synchronous 
way. 

• Give a formal specification of the protocol in Meije and analyze it with the verification 
tools auto and autograph. We will get a correctness proof of the protocol for the case 
N = P = 3, with 3 fixed arbitration numbers. 

4.1 Arbitration numbers allowed 

Consider condition Cl: any two arbitration numbers are bitwise incomparable. For exam¬ 
ple, 01 and 10 verify Cl. On the contrary, 11 and 10 do no verify Cl. 

We have the following property: Correct access to the bus is equivalent to Cl. 

Proof. Suppose a wrong access to the bus by two modules mO and ml. This means that one 
module, say mO, has taken the bus by seeing its arbitration number A0 on the bus, while 
ml is also in competition and has already put some 1 values. As ml also takes the bus, all 
l’s in A0 must also be present in Al. So Al is comparable to Al. Conversely, suppose 
that A0 and Al are not comparable and one module, say mO, first takes the bus. Then ml 
cannot also take it as it will never see its arbitration number on it. 

Now, let us consider priorities. Respect of priorities means that a module with lower 
priority cannot win if a module with higher priority is also in competition. So respect of 
priorities depends on the definition of being in competition. 

There are two distinct approaches: 

1. A module enters the competition after it has put the first 1. Call this “competition from 
the first”. 


Validation & analysis of the futurebus arbitration protocol 


199 


2. A module enters the competition after it has put the last 1. Call this “competition from 
the last”. 

We have the following properties: 

1. For competitions from the last, Cl implies respect of priorities. 

2. This is not the case for competitions from the first. 

Proof. Suppose a situation where two modules mO and ml compete for the bus and mO 
wins although it has a lower arbitration number. Then all the 1 ’s are already put by ml 
(because we are considering competition from the last) and they necessarily correspond 
to 1 put by mO (otherwise mO could not have taken the bus). So ml’s arbitration number is 
less than that of mO, which violates the hypothesis. 

For the second property, consider mO and ml with 110 and 101. These two arbitration 
numbers verify Cl. Now consider the following interleaving: mO puts 1, then ml puts 1, 
then 0, then 1, then reads its arbitration number on the bus and takes it. This scenario does 
not respect priorities as mO has higher priority than ml. 

Now let us define the condition C2 by: if two arbitration numbers have a 1 at the same 
place, they necessarily differ before this place. Examples: 110 and 101 do not verify C2. 
On the contrary, 110 and 011 do. 

We have the property: For competitions from the last, C2 implies respect of priorities. 
4.2 The asynchronous version 

We define the bus as a reactive process that is used by modules. 

The bus The bus is implemented as an array, with C functions to use it. The C code is 
the following: 

static int Bus[HOWMANYMODULES][HOWMANYLINES]; 

ChangeBus(value,index,ident) 
int value,index,ident; 

{ 

Bus[ident][index] = value; 

} 

int ReadBus(index) 
int index; 

{ 

int i ; 

for(i=0;i<HOWMANYMODULES;i++){ 
if (Bus [i] [index]==1){ 

return Bus[i][index]; 

} 


} 
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return 0; 

} 

int TryTakeBus(ident,code) 
int ident, code; 

{ 

static int aux[HOWMANYLINES] ; 
int i ; 

for (i=0;i<HOWMANYLINES;i++) aux[i] = ReadBus(i) ; 
return code == Code(aux); 

} 

We define a reactive process that implements the bus in the following way: 

rprocess int BUSIDENT(value,index,ident,order) 
int value,index,ident,order; 

{ 

rauto int r,i,j; 

f or (i=0; i<HOWMANYMODULES; i+ +) 

for (j =0; j <HOWMANYLINES; j ++) Bus [ i ] [ j ] = 0 ; 

for (;;) ( 

switch(order){ 

case READ: r = ReadBus(index); break; 
case CHANGE: ChangeBus(value,index,ident); 
r=0; break; 

case TRY: r = TryTakeBus(ident,value); break; 

} 

stop r; 

} 

} 

Modules Modules are using the bus as a task: 

rtask int Bus (value, index, ident, order) 
int value,index,ident,order; 

{ rprocess BUSIDENT MachineForBus; } 

To get one reaction of the bus, we have the following code: 

rproc int Once (value, index, ident, order) 
int value,index,ident,order; 

{ 
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into res activate Bus(value,index,ident,order); 
terminate(1); 
return res; 

} 

The code for a module is very similar to the C code given previously. In fact, the only 
change is in the use of the bus through reactive procedures instead of C functions. We also 
declare variables as rauto to retain their values from one instant to the next and enclose 
the code in a loop to limit the participation to the competition. A module corresponds to 
the following RC code: 

rproc void module(){ 

rauto int val, Ok, rank, idcode, j , count; 
idcode = Code(arbitnum); 

for(count=0; count<HOWMANYTIMES; count++){ 

for(;;) { 

/* Put my arbitration number */ 
for(j = 0;j<HOWMANYLINES; j ++) { 

if (arbitnum[j]) exec SetLine(j,ident); 

} 

printf(''%d: bus request\n' 7 ,ident); 

for(rank=0;rank<HOWMANYLINES;rank++){ 
if (arbitnum[rank]==0){ 
for(;;){ 

into val exec ReadLine(rank,ident); 
if (val==l){ 

/* Let others continue */ 
for(j=rank+l;j<HOWMANYLINES;j++){ 

if (arbitnum[j]) exec ResetLine(j,ident); 

} 

}else break; 

} 

} 

} 

/* Try to take the bus */ 

into Ok exec TryTakeBus(Code(arbitnum),ident); 
if (Ok) { 

printf(''%d: succeed to take bus\n'',ident); 
exec Work(ident); 
for (j = 0;j<HOWMANYLINES;j ++){ 

if (arbitnum[j]) exec ResetLine(j,ident); 

} 
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break; 

}else{ 

printf(''%d: failed to take bus\nident); 

} 

} 

} 

/* the end */ 

printf( ''%d: finished\n'',ident) ; 

} 

Notice that competition from the last is assumed as “bus request” is printed after the 
last 1 has been put on the bus. 

By running this code on several machines, we can test the protocol in asynchronous 
contexts. For example, we can run the bus on the cma machine and two modules on the 
same dance machine and get the following result: 

dance $ (modO cma&) ; (modi cma&) 

[1] 9776 

[1] 9778 

dance$ 1: bus request 

0: bus request 

0: succeed to take bus 

0: begins 

0: ends 

0: bus request 
1: failed to take bus 
1: bus request 
0: failed to take bus 
0: bus request 
0: succeed to take bus 
0: begins 
0: ends 

0: bus request 

1: failed to take bus 

1: bus request 

0: failed to take bus 

0: bus request 

0: succeed to take bus 

0: begins 

0: ends 

0: finished 

1: failed to take bus 

1: bus request 

1: succeed to take bus 

1: begins 

1: ends 
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1: bus request 
1: succeed to take bus 
1: begins 
1: ends 

1: bus request 
1: succeed to take bus 
1: begins 
1: ends 
1: finished 

5. Validations of the protocol 

In this section, we analyze the protocol in a particular case where N = P = 3, and with 3 
arbitration numbers 100, 010 and 001. We model lines, modules and the global system by 
three terms of the Meije process calculus. The main feature of Meije is the possibility to 
write combined actions. This feature is used to model the global test of the bus. This could 
not be possible using a calculus that does not allow for combined actions, such as CCS. 
The line is represented by the following term that is shown on figure 3. 

The line 

line = 

let rec { 

1111 = readl!:1111 + resetl?:lll 
and 111 = readl!:Ill + resetl?:ll + putl?:llll 
and 11 = readl!:11 + resetl?:10 + putl?:lll 
and 10 = putl?:ll + read0!:10) in 10 ; 


The module We have implemented the global test of the bus by the composite action 
“readli? . read2 j ? . read3k? . success !”. We have 3 modules almost identical. 
For example, the module with arbitration number 100 is represented by the following term 
that is shown on figure 4. It consists of 3 cells, one for each line, and in addition, of a fourth 
component to check for success. 

plOO = 

( 

(ProcCelll[lb_9/follow,lb_7/next_failure,lb_6/release, 
lb_4/token,lb_2/next_result,next__failure/failure, 

put 11 /put 1, readOl/readO , readli/readl, reset.ll/ reset 1 ] 

// 

SuccessTest [lb_9/next_follow, lb_8/follow, lb_7/failure / 
lb_6/next_release,lb_5/next_result,lb_4/next_token, 

lb__3/release, lb_2/result, lb_l/token] 

) \ lb_9 \ lb_7 \ lb__6 \ lb_4 \ lb_2 
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Figure 3. The line. 


// 

(ProcCellO [ lb_0/ failure, follow/next_follow, 

next_result/result, release/next_release, 

token/next_token, 

next_f ollow/ follow, result /next_result,put12 /put 1, 

read02/readO,read 1 2/readl,next.release/release,^ Bet 

next_token/token] 

7 ProcCellO[lb_8/next_follow,lb_5/result,lb_3/next_releas 
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Figure 4. The module 100. 

lb_l/next__token, lb_0/next_f ailure, put 13 /put 1, 
read03/readO,readl3/readl,reset13/reset1] 

)\lb_0\follow\failure\next__result\release\token 

) \lb_8\lb_5\lb_3 \lb_l\next__failure\next__f ollow 
\result\next_release\next_token; 

The cells can be of two types: ProcCelll or ProcCellO corresponding to: 

ProcCelll = let rec { 

tested = resetl!:rest 
and empty = next_releaseI:empty + 

failure? .next__failure! :rest + 
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result?.next^result!.readl?:rest + 

result?.readO ?.next__failure!:rest + release?:empty 
and full = release?.resetl!:empty + follow?. 

next_follow!:full + failure?.next__failureI:tested + 
result?.next_result!.readl?:tested + 
result?.readO?.next_failure!:tested 
and started = next_tokeni:full 
and rest = token?.putl1:started} in rest ; 

and: 

ProcCellO = let rec { 

yread = failure?.next_jE ailure!:init + 
result?.readl?.next_failure!:init + 
result?.readO?.next_result1:init 
and nread = readO? .next__follow! :yread 
and passed = readl?.next_release!:nread + 
readO?.next_follow!:yread 
and init - release?.next_release!:init + 

follow?:passed + token?.next__token!:init} in init ; 

The system In the system, we put in parallel 3 modules and 3 lines. All actions except 
success are hidden as can be seen by the parse term given below. 

parse net = ( 

(pi00[unsuccessi/unsuccess,success1/success] 

// pOlO[unsuccess2/unsuccess,success2/success] 

// pOOl[unsuccess3/unsuccess,success3/success]) ' 

// 

(line[put11/putl,readO1/readO, 

readll/readl,resetll/resetl] 

// line[putl2/put1,read02/readO, 

readl2/readl ; resetl2/resetl] 

// line[put13/putl,read03/readO , 

readl3/readl,reset13/resetl] ) ) 

\read03\readl3\putl3 \resetl3\putl2\resetl2 
\read02 \ readl2 \ readO l\readll\putll\reset 11 

Using AUTO, it is possible to show the observational automata indeed has the behaviour 
required. 

6. Conclusions 

In the previous sections, we have described synchronous and asynchronous solutions for 
the Futurebus arbitration protocol. Our analysis reflect that Esterel or RC-language 
frameworks provide a smooth framework for modelling, testing, validating and verifying 
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the solutions. Further experimentation is necessary in this direction. In fact, in Clarke et al 
(1992), hardware description languages have been used for finding the discrepancies in the 
Futurebus Cache Protocols; we have also effectively used Esterel for the same purpose 
with the same findings. To sum up, 

• A synchronous solution is useful even for the distributed case. It gives the possibility 
of making tests by having control on interleavings. The advantage is that bugs can be 
reproduced and analyzed. 

• There must be a smooth transition between synchronous and asynchronous solutions. 
Hence, there is a need for a unified framework for asynchrony and synchrony. In this 
paper, we have used RC for coding and testing the synchronous and asynchronous 
versions with minimal changes. Currently, we are trying to use the recently proposed 
unified framework, CRP (Berry et al 1993; Shyamasundar 1993) for these aspects. 

• Formal and automated proofs can be done in the above frameworks. In the above paper, 
we have shown the formal correctness of the protocol using auto/autograph. 

This work leads us to think that presently distributed protocol design and coding needs 
an heterogeneous approach to combine various techniques such as proofs made by hand, 
asynchronous and synchronous coding, use of process algebras, automata visualization 
and analysis with automated tools. 


The work was partially supported by IFCPAR, Indo-French Center for the Promotion of 
Advanced Research, New Delhi, India. 
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Appendix A: Esterel language 

The basic object of Esterel without value passing, referred to as PURE Esterel, is the 
signal. Signals are used for communication with the environment as well as for internal 
communication. 

The progr ammin g unit is the module. A module has an interface that defines its input 
and output signals and a body that is an executable statement: 

module M: 

input II, 12; 
output 01, 02; 
input relations 
statement 
end module 

Input relations can be used to restrict input events (Berry 1992). We shall only use 
exclusions, written in the interface part as 

relation II # 12; 

Such a relation that input events cannot contain II and 12 together. It is therefore an 
assertion on the behavior of the asynchronous environment. 

At execution time, a module is activated by repeatedly giving it an input event consisting 
of a possibly empty set of input signals assumed to be present and satisfying the input 
relations. The module reacts by executing its body and outputs the emitted output signals. 
We assume that the reaction is instantaneous or perfectly synchronous in the sense that the 
outputs are produced in no time. Hence, all necessary computations are also done in no time. 
In PURE Esterel, these computations are either signal emissions or control transmissions 
between statements; in full Esterel, they can be value computations and variable updates 
as well. The only statements that consume time are the ones explicitly requested to do so. 
The reaction is also required to be deterministic: for any state of the program and any input 
event, there is exactly one possible output event. In perfectly synchronous languages, a 
reaction is also called an instant. 

A.l Statements 

Esterel has two kinds of statements: the primitive or kernel statements, and the derived 
statements that can be expanded into primitive ones by macro-expansion and make the 
language more user-friendly. Derived statements are not semantically meaningful and will 
not be presented here. The list of kernel statements is: 

nothing 

halt 
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statl; stat2 
loop stat end 

present S then statl else stat2 end 

do stat watching S 

statl I| stat2 

trap T in stat end 

exit T 

signal S in stat end 

The kernel statements are imperative in nature, and most of them are classical in ap¬ 
pearance. The trap-ex it constructs form an exception mechanism fully compatible 
with parallelism. Traps are lexically scoped. The local signal declaration “signal in 
stat end” declares a lexically scoped signal S that can be used for internal broadcast 
communication wit h i n stat. The then and else parts are optional in apresent statement. 
If omitted, they are supposed to be nothing. 

A.2 Intuitive semantics 

At each instant, each interface or local signal is consistently seen as present or absent by 
all statement, ensuring determinism. By default, signals are absent; a signal is present if 
and only if it is an input signal emitted by the environment or a signal internally broadcast 
by executing an emit statement. 

To explain how control propagates, it is better to first give examples using the simplest 
derived statement that takes time: the waiting statement “await S”, whose kernel expan¬ 
sion 1 ^ halt watching S” will be explained in a moment. When it starts executing, 
this statement simply retains the control up to the first future instant where S is present. 
If such an instant exists, the await statement terminates immediately; that is the control 
is released instantaneously; If no such instant exists, then the await statements waits 
forever and never terminates. If two await statements are put in sequence, as in “await 
S1; awa it S 2 ”, one just waits for S1 and S 2 in sequence: control transmission by the 
sequencing operator V takes no time by itself. In the parallel construct “await SI II 
await S2”, both await statements are started simultaneously right away when the par¬ 
allel construct is started. The parallel statement terminates exactly when its two branches 
are terminated, i.e. when the last of SI and S2 occurs. Again, the “II” operator takes no 
time by itself. 

Instantaneous control transmission appears everywhere. The nothing statement is purely 
transparent: it terminates immediately when started. An “emit S” statement is instanta¬ 
neous: it broadcasts S and terminates right away, making the emission of S transient. In 
“emit SI; emit S2”, the signals SI and S2 are emitted simultaneously. In a signal- 
presence test such as “present S ... ”, the presence of S is tested for right away and 
the then or else branch is immediately started accordingly. In a “loop stat end” 
statement, the body stat starts immediately when the loop statement starts, and whenever 
stat terminates it is instantaneously restarted afresh (to avoid infinite instantaneous looping, 
the body of a loop is required not to terminate instantaneously when started). 

The watching and trap-exit statements deal with behavior preemption, which is 
the most important feature of Esterel. 



210 


F Boussinot et al 


In the watchdog statement “do state watching S”, the statement stat is executec 
normally up to proper termination or up to future occurrence of the signal S, which is 
called the guard. If stat terminates strictly before S occurs, so does the whole wat chine 
statement; then the guard has no action. Otherwise, the occurrence of S provokes immediate 
preemption of the body stat and immediate termination of the whole watching statement 
Consider for example the statement 

do 

do 

await II; emit 01 
watching 12; 
emit 02 

watching 13 

If II occurs strictly before 12 and 13, then the internal await statement terminate: 
normally; 01 is emitted, the internal watching terminates since its body terminates, 02 i; 
emitted, and the external watching also terminates since its body does. If 12 occurs befori 
11 or at the same time as it, but strictly before 13, then the internal watching preempts thi 
await statement that should otherwise terminate, 01 is not emitted, 02 is emitted, and th< 
external watching instantaneously terminates. If 13 occurs before 11 and 12 or at the sami 
time as then, then the external watching preempts its body and terminates instantaneousl) 
no signal being emitted. Notice how nesting watching statements provides for priorities. 

We can now explain why “await S” is defined as “do halt watching S”. Th 
semantics of halt is simple: it keeps the control forever and never terminates. When : 
occurs, halt is preempted and the whole construct terminates just as expected. Notice tha 
halt is the only kernel statement that takes time by itself. 

The trap-exit construct is similar to an exception handling mechanism, but with pure! 
static scoping and concurrency handling. In “trap T In stat end”, the body stat i 
run normally until it executes an “ex i t T” statement. Then execution of stat is preemptei 
and the whole trap construct terminates. The body of a trap statement can contai: 
parallel components; the trap is exited as soon as one of the components executes a 
“exit T” statement, the other components being preempted. However, exit preemptio: 
is weaker than watching preemption, in the sense that concurrent components execut 
for a last time when exit occurs. Consider for example the statement 

trap T in 

await II; emit 01 
II 

await 12; exit T 

end 

If II occurs before 12 , then 01 is emitted and one waits for 12 to terminate. If I 
occurs before II, then the first branch is preempted, the whole statement terminates ir 
stantaneously, and 01 will never be emitted. If II and 12 occur simultaneously, the 
both branches do execute and 01 is emitted. Preemption occurs only after execution at th 
concerned instant: by exiting a trap, a statement can preempt a concurrent statement, bi 
it does leave it its “last wills”. 
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Since we accept simultaneity, we must define what it means to exit several traps simul¬ 
taneously, i.e. define priorities between traps. The rule is simple: only the outermost trap 
matters, the other ones being discarded. For example, in 

trap T1 in 

trap T2 in 

exit T1 

II 

exit T2 

end; 
emit 0 

end 

the traps T1 and T2 are exited simultaneously, the internal trap T2 is discarded and 0 
is not emitted. 

Traps also provide a way of breaking loops, which would otherwise never terminate: 

trap T in 

loop ... exit T ... end 

end 


A.3 Value handling 

Since full Esterel will be informally used in the sequel, we briefly describe the way in 
which values are handled. 

Types can be either predefined like integer or be abstract like Time; abstract types 
are meant to be implemented in the host language in which a program is compiled, C or 
ADA for example. 

A signal can carry a value of a type declared in the signal declaration. A valued signal 
has a unique value at each instant. A signal value may change only when the signal is 
received from the environment or locally emitted with a new value, by executing “emit 
S (exp) ”. The current value of a signal S is accessed at any time by the expression ’ ? S’. 

One can declare local variables by the statement 

var X in stat end 

Variables deeply differ from signals by the fact that they cannot be shared by concurrent 
statements. Variables are updated by instantaneous assignments “X:=exp” or by instanta¬ 
neous side-effecting procedure calls “call P (...) ”, where a procedure P is an external 
host-language piece of code that receives both value and reference arguments. 

Expressions may involve variables, signal values 4 ? S’, and external host-language func¬ 
tion calls (external functions must not perform side effects). The computation of an ex¬ 
pression is instantaneous. The “if exp then statl else stat2 end” statement in¬ 
stantaneously tests for the truth of exp. 

Finally, occurrence counters can be added to preemption statements, as in “do stat 

watching 5 S”. 




Sadhana , Vol. 21, Part 2, April i996, pp. 213-228. © Printed in India. 


Converting a Biichi alternating automaton to a usual 
nondeterministic one 
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Abstract. We first give a method for simulating, in the case of Biichi, an 
alternating automaton by a usual nondeterministic one. Then, to make the sat¬ 
isfiability problem of Linear Propositional Temporal Logic (LPTL) use this 
result, we give a method for translating any formula of this logic into an equiv¬ 
alent Biichi alternating automaton. 

Keywords. Biichi alternating automaton; nondeterministic automaton; linear 
propositional temporal logic. 

1. Introduction 

Automata on infinite sequences were introduced in the early sixties, first by Biichi (1960) 
and then by Muller (1963). Initially, the two types of automata on infinite words, Biichi 
automata and Muller automata, seemed not to have the same expressiveness. McNaughton 
(1966) showed that Biichi nondeterministic automata have the same expressive power than 
Muller automata. 

The theory of automata on infinite sequences is used in many areas of computer science. 
The satisfiability problem of many logics, which consists of testing whether a formula is 
satisfiable, can be translated into the emptiness problem of an automaton on infinite words; 
that is, the problem of testing whether such an automaton accepts a nonempty language. 

In the early eighties, alternating automata have been introduced as an extension to usual 
nondeterministic automata (Chandra etal 1981; Miyano & Ayashi 1984; Muller & Schupp 
1985,1987). With the latter automata, states are only existential. The new with the former 
automata is that states can also be universal. The advantage of using alternating automata, 
especially as they are defined in (Muller & Schupp 1985,1987), is that they offer a natural 
and straightforward way of translating a temporal formula into such an automaton Isli 
(1994). Another advantage is that complementation is easy, and requires linear time; it is 
performed by dualizing the transition function and complementing the accepting condition 
(complementation theorem (Muller & Schupp 1985,1987)). 

In this paper, we first give a method for simulating a Biichi alternating automaton by 
a usual nondeterministic one. Then we describe a method associating to any formula of 
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Linear Propositional Temporal Logic (LPTL (Pnueli 1981)) aBiichi alternating automaton 
accepting its models. This clearly leads to an alternating automata approach for the satis¬ 
fiability problem of LPTL, which first translates the input formula into a Biichi alternating 
automaton and then simulates the alternating automaton by a usual nondeterministic one. 


2. Biichi automata on co-words: simulating an alternating by a usual 
nondeterministic 

Alternating automata (Chandra etal 1981; Miyano & Ayashi 1984; Muller & Schupp 1985; 
Muller & Schupp 1987) have been introduced as an extension to usual nondeterministic 
automata. With the usual nondeterministic automata, states are only existential, while with 
the alternating automata the states can also be universal. As defined in (Chandra et al 
1981; Miyano & Ayashi 1984), a state of an alternating automaton is either existential oi 
universal, and cannot be intermediate; if E and Q are the alphabet and the set of states, 
respectively, then the transition function maps a pair (a, q) of E x Q into a subset of Q 
which has to be interpreted either universally or existentially. The definition of alternation 
in (Muller & Schupp 1985,1987) is more natural, and allows states to be, say, existential- 
then-universal: the transition function maps (a, q) into a set of subsets of Q} the set being 
existential and each subset universal (first choose, nondeterministically, a subset, and then 
all the states of the chosen subset). Another advantage of defining alternating automata 
as in (Muller & Schupp 1985, 1987) is that translating a temporal formula into such an 
automaton is a straightforward and extremely easy task (the translation is given in § 3). 

A run of an alternating automaton on an infinite word is not, as it is the case for usual 
nondeterministic automata, necessarily an infinite sequence of states, but, in general, ar 
infinite tree. The accepting condition concerns not only a unique infinite sequence but eact 
branch of the tree form of the run. 

In this section, we propose, in the case of Biichi, an effective construction mapping ar 
alternating automaton into a usual nondeterministic one simulating it. This transforms the 
emptiness problem of the former automaton to the well known emptiness problem of the 
latter. 

2.1 Alternating automata 

For reasons mentioned above, our definition of alternating automata follows (Muller & 
Schupp 1985,1987). 

DEFINITION 1 

Let £ be a countable set. The free distributive lattice £(£) generated by E is defined a: 
the least set satisfying: 

(1) each element of E belongs to £(£), and 

(2) if / and g belong to £(£), then so do both / A g and / v g. 


’in fact, the transition function maps (a, q) into an element of the free distributive lattice £-(Q) generated by Q 
the disjunctive normal form of this element of £( Q), written in a set form, is a set of subsets of Q. 
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DEFINITION 2 

A Biichi alternating automaton on infinite words is a 5-tuple 
M = (£(<2), E, S, qo, F) defined in the following manner: 

(1) Q is a finite state set, 

(2) C(Q) is the free distributive lattice generated by Q, 

(3) E is the input alphabet, 

(4) 8 : E x Q —> £(<2) is a transition function, 

(5) qo e Q is the initial state of M, and 

(6) F is a set defining the accepting condition. 

The set F defining the accepting condition is defined in the same fashion as in the usual 
nondeterministic case. That is, F is a subset of Q, called set of distinguished states. 

DEFINITION 3 

A run of a Biichi alternating automaton M = (£((?), E, <5, qo, F) on an infinite word 
u = aoa\ ... € E" is an infinite labeled tree, with no leaf, verifying the following: 

(1) the labels (of the nodes) belong to the state set of M, 

(2) the label of the root is qo, the initial state of M, and 

(3) if 


- v is a node of level n labeled by q, 

- a n is the n th letter of the word u, 

- {q \,..., q m } is the set of labels of the immediate successors of v, and 

- t\ v t 2 v ... v t r is the disjunctive normal form of S(a n , q), 

then there exists j e {1,..., r} verifying [q \,..., q m ) = [p e Q : p occurs in l )•}; 
that is, the labels of the immediate successors of v are exactly the states occurring in 
a certain disjunct, tj, of the disjunctive normal form of S(a n , q). 

Figure 1 illustrates part (3) of definition 3. 

Uniform run 

A run of a Biichi alternating automaton on an infinite word is said to be uniform if for 
all nodes v\ and vj at the same level and having the same label, and for all state q of the 
automaton, the following condition holds: v\ has an immediate successor labelled by q if 
and only if V 2 has an immediate successor labelled by q. 

History 

A history of a Biichi alternating automaton is an infinite sequence , ?i 0 ?q of states 

of the automaton. 



216 


Amor Isli 


level n 


level n +1 


q 



8(a n ,q) -t\ vf 2 v... Vf r , 
with {qi,..., q m } = {p e Q : p occurs in tj}, 
for a certain j e { 1 ,..., r}. 


Figure 1. The immediate successors of a node in a ran. 

History lying on a branch 

Let M be a Biichi alternating automaton, and t a run of M on an infinite word u. A branch 
of t is an infinite path starting at its root. The history lying on a branch ji of the run t is 
the history qi Q qi l ...qr ...of the automaton M such that its jth element qij , j > 0, is the 
label of the jth node of the branch fh. 

Accepting history 

To ahistory h of aBiichi alternating automaton M = (C(Q), E, S, qo, F) are associated: 

- the mapping c/, : N —*• Q such that Ch(n) is the nth letter of h, and 

- the set Inf(h) = {q e Q: \c^ l (q)\ = co}, where c^(q) — {n e N: c*(n) = q}. 

Then a history is accepting if Inf(h)DF ^ 0. 

Accepting run 

A branch of a run is accepting if the history lying on it is accepting. A run is accepting if 
all its branches are accepting. 

An infinite word u is accepted by a Biichi alternating automaton M if there exists an 
accepting run of M on u. The language accepted by M, denoted by L(M), is the set of 
infinite words accepted by M. 

Remark 1. Let t be a uniform run of a Biichi alternating automaton M on an infinite word. 
If ui and i >2 are two nodes at the same level and with the same label, then a history lies on 
a branch of the subtree t/v\ of t at i>i if and only if it lies on a branch of the subtree t/v 2 
of t at t> 2 . 
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Theorem 1 below shows that in an alternating automaton we can, without loss of gen¬ 
erality, restrict our attention to uniform runs only. 

Theorem 1. Let M be an alternating automaton. For each word u accepted by M, there 
exists an accepting uniform run of M on u. 

Proof see appendix A. □ 

2.2 Run DAG of an alternating automaton 

We now need, in order to get our simulation method, to transform a uniform run of a Biichi 
alternating automaton from its tree form into a directed acyclic graph (DAG) form. This 
new form of a run will be called run DAG, and is defined in the following manner. A run 
DAG of a Biichi alternating automaton M on an infinite word is the quotient of a unif orm 
run t of M on that word modulo the following equivalence relation R t defined on the set 
of nodes of t: 


v\R t V2 
if and only if 

vi and V 2 are at the same level, and labelled by a same state. 

2.2a Distinguished levels of a run DAG: Let G be a run DAG of a Biichi alternating 
automaton M = (£(£?), E, <5, qo, F) on an infinite word u. G is said to be accepting if the 
uniform mn of which it is a quotient is itself accepting. It is easily seen that G is accepting 
if and only if on all infinite path leaving the root we meet infinitely often an element of F. 
Let us now have a traversal of G. 

Let «o the least level of G such that: 

every path joining the root to a node of level no has at least one labell from F. 

Let n,+i be the least level such that: 

1 . n; < Kj+i, and 

2 . every path starting at a node of level («,- + 1), and terminating at a node at level n;+i, 
has at least one label from F. 

In the sequel, such levels n, (i > 0) are called distinguished levels of G. 

Theorem 2. A run DAG of a Biichi alternating automaton is accepting if and only if the 
number of its distinguished levels is infinite. 

Proof. Straightforward. Left to the reader. □ 

2.2b Characterizing the distinguished levels: The problem to deal with now is how 
to characterize the distinguished levels of a run DAG. The solution we propose to solve 
this problem consists of keeping track of the states met while traversing a run DAG, and 
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In the theorem, the notation transition^, Q j ) stands for the set of subsets of Q x {0, 1} 
defined as follows: 

S € transition(a, Q\) 

# 

3m (infinite word), n (nonnegative integer) such that 
Q\ = uo(n), S = ucin 4-1), and 
u = bobi.. ,b n .. with b n = a. 

Theorem 3. Let M = (C(Q ), E, S, qo, F) be a Biichi alternating automaton. The usual 
Biichi nondeterministic automaton B = {Q, E, S', Qo, F') defined as follows: 

(1) Q = the set of subsets of Q x {0, 1}, 

(2) S'(a, Q\) = transition(a, Q\), for all (a, Q\) e E x Q, 

(3) 0 _ { {(90, 1)} if 90 € F, 

1 {(90, 0)} otherwise, and 

(4) F'= 2 fix(1} , 
simulates M. 


Proof. Straightforward from the characterization of the distinguished levels of a run DAG, 
described in § 2.2b. □ 

2.4 Size of the simulating automaton 


Let M = (£(<2)» E, S, qo, F) be a Biichi alternating automaton, and n the cardinality of 
Q. From the construction of the nondeterministic automaton B = {Q, E, S', Qo, F') sim¬ 
ulating M, in particular the construction of its transition function, it follows that for any 
state q of M, and for any reachable state Q\ of B, (q, 0) and (q, 1) cannot be simulta- 
neousely in Q\. It follows that the cardinality of the state set Q of B is bounded by the 
number of mappings / from Q into {0, 1, 2} defined as follows: 


Oif (q,0) € Qu 
f{q)=- lif($, 1) € Q u 
. 2 otherwise. 

The number of such mappings is clearly 3”. 

The following two points imply that this upper bound is better than 3": 


for all q € F, and for all Q\ e Q : ( q, 0) f Q\. 


In summary, given a Biichi alternating automaton M whose state set is of size n, and whose 
set of distinguished states is of size d, the size of the Biichi nondeterministic automaton 
simulating M, constructed by theorem 3 is bounded by 2 d 3 n ~ d , i.e. (2/3) d 3 n . 


3. Mapping an LPTL formula into a Biichi alternating automaton 

To make the satisfiability problem of LPTL use the result of the last section on the empti¬ 
ness problem of Biichi alternating automata, we investigate in this section the problem of 
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mapping an LPTL formula into a Buchi alternating automaton accepting its models. 

3.1 Background: the logic LPTL 

LPTL is an extension of the classical propositional logic. This extension is obtained by the 
adding of the temporal operators O (the “next”), O (the “eventually”) and U (the “until”). 

Syntax 

LPTL formulas are built from the following alphabet: 

- a countable set V of atomic propositions p,q,r, 

- the boolean constructors a and and 

- the temporal operators 0> O and U. 

The set of LPTL (well formed) formulas is defined as the least set verifying: 

(a) every atomic proposition p eV is a formula, and 

(b) if / and g are formulas, then so are / a g, -i/, Q/, Of and fUg, 

Remark 2. The temporal operator G (the “always”) is used as an abbreviation of 
Gf — ->0~‘f. 


Semantics 

LPTL is complete for the class /C of structures f = (5, N, n) defined as follows (Manna 
& Wolper 1984; Wolper 1983): 

(a) 5 is a countable state set, 

(b) N: S >■ S is a successor function mapping each state s into a unique successor state 
N (s), and 

(c) n: S ► 2 V is a function mapping each state s into a set of atomic propositions. 

Remark 3. In a structure £ = ( S, N,n ), the function n partitions in each state s € S the 
set V of atomic propositions into the set 7r(s) of atomic propositions true in s, and the set 
V \ 7t(s) of atomic propositions false in s. Hence, n is a function assigning truth values to 
atomic propositions in each state. 

Satisfiability 


Let | - (S, N, it) be a structure of the class K, and s a state from S. The satisfiability 
of an LPTL formula / by the state s of the stucture, denoted by (£, s) 1= /, is defined 
recursively as follows: 

(a) if / is an atomic proposition then: <§, s) b / if and only if / e n(s), 

(b) (£, s)l=/iA f 2 if and only if (|, s) ft and , s) (= ft. 
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(c) (£, s) |= ->/i if and only if not({%, s) f= /l), 

(d) {£, s) |= O/l if and only if (f, iV(s)} |= f\, 

(e) <£, 5) [= Ofi if and only if (3/ > 0)((£, N‘(s)) |= /i), and 

(f) j) |= fi Uh if and only if 

• (Vi >0)«f,N‘'(s)> N/i),or 

• (3i > 0)«£, N‘(s)) (= / 2 and 
Vj(0 <j<i=> <$, lW(*)> t= /i)). 

Remark 4. In the definition above of satisfiability: 

• IV 0 (s) stands for s, and 

• A/' ,+1 (s), i > 0, stands for N(N l (s)). 

An interpretation i consists of a structure £ = (S, N, n ) and an initial state sq: it is denoted 
by 


l = (|, 5 0 ). 

An interpretation i = (%,so) satisfies a formula / if and only if 

(Mo >1=/- 

An interpretation satisfying a formula is a model of that formula. 

The satisfiability problem for LPTL consists of answering the question of whether a 
given formula of this logic is satisfiable; that is, whether it has a model. 

3.2 Biichi automata on interpretations 

The automata we are concerned with in this section are automata on interpretations. That 
is, automata of infinite sequences of sets of atomic propositions. 

Given a set A of literals built from a set V of atomic propositions, we define ->A as 
being the following set of literals: 

-‘A = [p : p e V and ->p e A} U {->p : p e P fl A). 

DEFINITION 4 

A Biichi nondeterministic automaton on interpretations is a 5-tuple M = (Q, £, S, qo, F) 
defined as follows: 

- E is the input alphabet: E = V U -<V, V being a countable set of atomic propositions, 

- Q is a finite state set, 

- S : Q —y 2 2 * x 2 is a transition function, 

- 9o 6 Q is the initial state of M, and 

- FC Q is a set of distinguished states. 
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q 

5(4) 

40 

{(0,4l)} 

4l 

{(0,42), (0,43)} 

42 

{(OK 42), ({->c, j}, 44 ), ({-><?, j], 44 ), ({-’C, ;}, 45 ), ({-■«, jK 45 )} 

43 

{(OK 42 ), (OK 43 ), (OK 46)} 

44 

{(OK 42 ), j], 43 ), ({-■«, jh 43 )} 

45 

{(00,42), (00,43)} 

46 

{(00, 42 ), ({—'c, j}, 43 ), ({- , £, ;0, 43 ), (OK 46)} 


Figure 2. the transition function <5 of the automaton M of the example. 

The transition function maps each state into a set of subsets of 2 s x Q; that is, 8(g) has 
the form {(A;,, q h ),(A im , q jm )\A ik c £, q jk e Q, Wk = 1,..., m}. As shown in 
the definition below, if (A, q') e 8(q) then the intuitive meaning of the literal in A is the 
following: the positive literals give atomic propositions that are necessary, and the negative 
literals give the atomic propositions that are forbidden. 

DEFINITION 5 

A mn of a Biichi nondeterministic automaton M = { Q , £, 8, qo, F) on an interpretation 
i = £o e i<? 2 . • - e„ .. .is an infinite sequence c = 4; 0 4i 1 4i 2 ■■■&„■■ ■ of states of M verifying 
the following: 

1* 4io — 40, 

2. Vn > 0 3(A, q) € S( 4 , n ) such that: 

(a) 9 = , and 

(b) for all atomic proposition p: 

i. (p € A) =*■ (p € e„), 

ii. (-/> e A) => (p i e n ). 

The point 5 says intuitively that the positive literals of A are necessary, and that the 
negative ones forbid their corresponding atomic propositions. The presence or absence 
of an atomic proposition for which neither of the two corresponding literals (neither the 
positive nor the negative) belong to A is irrelevent. 


Example. Let us consider the following Biichi nondeterministic automaton 
M — (Q, S, 8, qo, F): 

- £ = V U -'V, with V — {c, e, j }, 

~ Q~ too, 96). 

- the initial state is qo, 

- the transition function is given by the table of figure 2, and 

~ F = { 40 , 41 , 44 , 45 }- 
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Figure 3. The graphical representation of the automaton M of the example. 

The graphical representation of the automaton M is given by figure 3. The dashed states 
are the distinguished states of M. The incoming arrow shows the initial state. 

The infinite sequence (< 72 ^ 4 )“ is a run of the automaton M on each of the inter¬ 
pretations {c, e, ;} 2 ({c, j}{e, j)T, {c, e}{c, y})" and {/}{c}({y }{c, e, j })«. The 

run qoqi (q 2 q 4 )" is accepting, for it repeats infinitely often the distinguished state # 4 . 

DEFINITION 6 

A Btichi alternating automaton on interpretations is a 5-tuple 
M = (£(E U Q), 8, qo, q v , F) defined in the following manner: 

- E is the input alphabet: E = V U -<P, V being a countable set of atomic propositions, 

- Q u {^v} is a finite state set, q v £ Q being a special state called “the valid state” of M, 

- £(E U Q) is the free distributive lattice generated by E U Q, 

- 8 : Q U {q v } —>- £(E U0U {q v } is a transition function verifying: 

== 

- #0 € Q is the initial state of M, and. 

- F is a set of distinguished states including q v . 

DEFINITION? 

A run of a Biichi alternating automaton M = (£(E U Q), 8, qo, q v , F) on an interpretation 
1 = eoei ...e n ... e ( 2 s )" is an infinite labelled tree defined as follows: 
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1. all node of the tree is labelled by a state of M, 

2 . the label of the root is qo, the initial state of M , and 

3. (a) if 

i. d is a node of level n labelled by q, 

ii. e n is the n th element of the interpretation i, i.e. the n th element of the infinite 
sequence eoei.. .e n .. 

iii. {q \,..., q m ) is the set of labels of of the immediate successors of v, and 

iv. q v t 2 v ... v t r is the disjunctive normal form of <$(<?), 

(b) then there exists j e {1 ,..., r) such that: 

f {p 6 Q : p occurs in tj} if {p e Q : p occurs in tj} ^ 0, 

i. [q u ..., q m ) - j ^ otherwise, 

ii. {p € V : p occurs in tj} c e n , and 

iii. e n fl {p e V : ->p occurs in tj} = 0. 

As in the usual nondeterministic case, the points 7 and 7 say intuitively that the posi¬ 
tive literals of tj give the necessary atomic relations, and that the negative ones give the 
atomic propositions that are forbidden; the presence or absence of the atomic propositions 
for which neither of the corresponding literals belong to tj is irrelevent. We need some 
other definitions before giving the construction mapping an LPTL formula into a Biichi 
alternating automaton accepting its models. 

DEFINITION 8 

The set Subf(f) of subformulas of an LPTL formula / is defined recursively as follows: 

- if / is an atomic proposition then Subf(f) = {/}, 

- Subf(df ) = [Of] u Subfif), e e {-, o, 

- Subfif eg) = {fOg} u Subfif) u Subfig), e e {a, u). 

DEFINITION 9 

An elementary formula is a formula / either of the following forms: 

- / is a literal, or 

- / is prefixed by the temporal operator Q>, i.e. O is the main operator of / ( / of the 
form Og). 

DEFINITION 10 

An eventuality is a formula of the form <>/ or ~^( fUg). 

The following two definitions are the key points in the construction of a Biichi alternating 
automaton accepting the models of an LPTL formula. The first concerns the decomposition 
of a formula into elementary ones, and will be used for finding the transition function of the 
alternating automaton to associate to a formula. The second definition defines the closure 
of a formula, which will be used for determining the state set of the alternating automaton. 
These definitions are based on the following equivalences which are straightforward from 
the the definition of satisfiability: 
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(equiv 1) Of ~fv QOf 

(equiv2) /, Uf 2 =/ 2 v/,A 0(/iTO 

(equiv 3) -■0/ = —•/ A O^O/ 

(equiv 4) -(/if// 2 ) = -/ 2 A (-/i v 0(/i TO) 

(equiv 5) -■ O / = O “■/ 

DEFINITION 11 

The decomposition, elem(/), of an LPTL formula / into elementary formulas is defined 
recursively as follows: 

- if / is a literal: elem(/) = /, 

- elem(->-</i) =elem(/i), 

- elem(/i A / 2 ) =elem(/i)Aelem(/ 2 ), 

- elem(->(/i a / 2 )) =elem(--/i)velem(-’/ 2 ), 

- elem(0/i) =elem(/i) v OO/i. 

- elem(/ii7/ 2 ) =elem(/ 2 )velem(/i) a 0(/iTO. 

- elem(0/i) = Ofu 

- elem(->0/i) =elem(->/i) a O'-’O/i, 

- elem(->(/it// 2 )) =elem(->/ 2 ) a (elem(-/i) v O^ihUfi)), 

- elem(-> O /l) = O ~'h- 
DEFINITION 12 

The closure, cl(/), of an LPTL formula / is defined recursively as follows: 

- if / is a literal: cl(/) = 0, 

- cl(—/i) =cl(/,), 

- cl(/i A / 2 ) zzcK/OUcK/ 2 ), 

- cl(-(/i A / 2 )) =cl(->/i)Ucl(-'/ 2 ), 

- cl('C > /i) =cl(/i) U {O/i}, 

- cl(/iI// 2 ) =cl(/ 2 )Ucl(/i) U {/iTO, 

- cl(0/l) =cl(/i) U {/O, 

- cl(-.<>/i) =cl(-/i) U {-/i}, 

- cl(-(/it// 2 )) =cl(-i/ 2 )Ucl(-i/i) U {-'(fiUfi)}, 

-cl(-0/l)=d(-/l)U{-/i}. 

3.3 Biichi alternating automaton of an LPTL formula 

The following theorem gives an effective construction of a Biichi alternating automaton 
accepting the models of an LPTL formula. 
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Theorem 4. Let f be an LPTLformula. The set of models off is the language accepted by 
the Bttchi alternating automaton By = (£(S U Q), 8, qo, qv, F), calledBtichi alternating 
automaton of f, defined in the following manner: 

- £ = V U —tp, V being the set of atomic propositions occurring in f, 

- Q = {(/}} U {(g): g e cl(/)}, 

- 8((g)) is the result obtained by substituting, for every h, (h) for each non-nested 
occurence 2 Q)h, 

- qo = (/). and 

- the set F of distinguished states is: 

F — {(g) € Q : g is not an eventuality formula] U {q v }. 

Proof. The only point we clarify is the choice of the set F of distinguished states. For the 
initial formula to be satisfied, there should exist an interpretation satisfying it, that is a 
model of it, on which one can construct a run of the automaton verifying the following: 
every time we meet on a branch a node labelled by a state (g) with g being an eventuality 
formula, we can find, along the suffix of that branch beginning at that node, a state label 
by (h) is such that h is not an eventuality formula. Stated otherwise, the run should repeat 
infinitely often, on each of its branches, states (g) such that g is not an eventuality. Hence 
the result. □ 

4. Related work 

The problem of simulating, in the case of Btichi, an alternating automaton by a usual non- 
deterministic one has been investigated by other authors (Miyano & Ayashi 1984; Muller 
& Schupp 1993). However, we believe that our method offers an easier implementation. 
In fact, in (Miyano & Ayashi 1984; Muller & Schupp 1993), the authors were mainly 
interested in proving that alternating automata have the same expressive power than usual 
nondeterministic automata, and their purpose was not to give an easily implementable 
translation. 

Another automata-based approach to the satisfiability problem of LPTL is well-known 
in the literature Vardi & Wolper (1986). Vardi and Wolper’s method maps an LPTL formula 
into a Btichi nondeterministic automaton by performing the cross product of a first au¬ 
tomaton called “local automaton” and a second automaton called “eventuality automaton”. 
Each of the local automaton and the eventuality automaton (and the final Btichi nonde- 
terministic automaton) is of size single-exponential in the length of the input formula. 
The Btichi nondeterministic automaton our alternating automata approach associates to 
an LPTL formula is also of size single-exponential in the length of the input formula. 
However, our method is more natural; furthermore, the intermediate automaton used (the 
Btichi alternating automaton) is of size linear in the length of the input formula (the size 
is the cardinality of closure). 


2 An occurrence which is not in the scope of a temporal operator. 
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5. Conclusion 

The investigation of this paper can be viewed as an alternating automata approach to the 
satisfiability problem of Linear Propositional Temporal Logic (LPTL Pnueli (1981)). We 
first gave a method for translating, in the case of Buchi, any alternating automaton into a 
usual nondeterministic one. We then provided a method for mapping any LPTL formula 
into a Biichi alternating automaton accepting its models. 

Our current concern is the minimal model property for LPTL: we strongly believe that 
one can use the decreasing property of very weak alternating automata (see Isli (1993) for 
the class of very weak alternating automata, and (Muller et al 1988; Muller et al 1992) for 
the class of weak alternating automata which has at least an equal expressive power) to 
improve the minimal model property for LPTL. 


I am indebted to my thesis advisor Professor Ahmed SAOUDI, who has been a major 
contributor to this work. He passed away on August 11, 1993. 
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Appendix A: Proof of theorem 1 

Let M = (C{Q), E, 8, qo, F) be a Btichi alternating automaton, and u an infinite woi 
accepted by M. There exists an accepting run of M on a: let t be such a run. We define 
sequence (f„)„>o of runs of M on u in the following fashion: 

(a) t 0 = f, 

(b) for all n > 0, f n+ i is obtained from t„ as follows: let NODES(t n , n + 1, q) be the s 
of nodes of t n of level n + 1 labelled by q. For all q e Q such that \NODES(t n ,n 
1- q)\ > 2 , we perform the following operations: 

(bl) choose a node v from NODES(t n , n + l,q), 

(b2) for every node v' belonging to NODES(t n , n + 1, q) \ {u}, we substitute tl 
subtree t n /v of t n at v for the subtree t n /v' of t„ at v'. 

It is easy to see that for all n > 0, t n is an accepting run of M on u. The limit of t 
sequence (t n ) n >o is clearly an accepting unform run of M on u. 
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Reuse of proofs in software verification 
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Abstract. This paper presents a method for automated reuse of proofs in 
software verification. Proofs about programs as well as proof attempts are used 
to guide the verification of modified programs, particularly of program cor¬ 
rections. We illustrate the phenomenon of reusability, present an evolutionary 
verification process model and discuss theoretical and technical aspects. Fi¬ 
nally, we report on case studies with an implementation of this method in the 
Karlsruhe Interactive Verifier (KIV). 

Keywords. Automated reuse of proofs; software verification; Karlsruhe in¬ 
teractive verifier. 


1. Introduction 

Currently, the technological frontier for developing correct software is somewhere between 
1000 and 2000 lines of verified code per year (Moore 1988,Re92b). This productivity can 
be achieved with advanced verification systems, such as the Boyer & Moore Prover (Boyer 
& Moore 1979), or the KIV system (Reif 1992). Combined with a hierarchical and strictly 
decompositional software design discipline, these systems can be used to verify fairly 
large systems. Nevertheless, the development of verified software is still a time and money 
consuming activity. 

Most verification systems make the tacit assumption that the major problem to be solved 
is to verify (affirmatively) a software system. Experience shows, however, that in prac¬ 
tical applications this assumption is not realistic and must be relaxed: a proof attempt 
is more likely to reveal errors than to prove their absence, and the programs under con¬ 
sideration might undergo several modifications until correct versions are obtained. Most 
verification systems ignore this evolutionary aspect of programming. In KIV this defect 
is overcome. KTV pursues an evolutionary verification model and offers tool support for a 
tight integration of error correction and verification. 

The main problem to be solved is to preserve or to re-establish proofs that have become 
invalid by modifications. This is illustrated by the following scenario. Let q>\ (a) and (f> 2 (a) 
be two proof obligations with a program a occurring in both formulas. Suppose, <p\ (a) 
has been successfully proved, although the program a is erroneous. This means that the 
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errors in oc do not affect the truth of cpi (a). Assume further, that a first proof attempt for 
<p 2 (<x) failed, because of the errors in a. After correction of a this proof has to be repeated. 
Moreover, since the program a has changed to jS, cp\ (a) has changed to <p\(J3). Therefore 
the original proof for (p\ ( a) (although successful) becomes obsolete. This means that for 
every program correction all proof obligations involving the corrected program have to be 
proved again. Conventional verification systems repeat these proofs over and over again 
without exploiting the experience accumulated during earlier proof attempts. 

This paper presents a method which extracts this experience from previous proofs in 
order to guide the proofs for the corrected programs. Case studies with the KIV system 
(Heisel et al 1990; Reif 1992) have shown that large parts of earlier proofs can actually be 
reused, saving a lot of proof search and user assistance. 

The phenomenon of reusability is illustrated by an example in § 2. Section 3 investigates 
the basic assumptions of the approach, and sketches an evolutionary verification method¬ 
ology based on reuse of proofs. In § 4 we present the theoretical and technical aspects. In 
§ 5 we report on our experiences with an implementation in the KIV system, and give an 
evaluation of the experimental results. Section 6 comments on related work, and in § 7 we 
draw some conclusions. 


2. An example 

2.1 Binary arithmetic and dynamic logic 

Consider binary words with two constants zero and one , as well as two unary constructors 
so and si. The constructors so and si add zero or one , respectively, at the end of a given 
binary word {s\{zero) stands for the word ‘01’). The selector top selects the last bit 
of a word, and pop cuts off the last bit. The unary predicate nlz(a) is tme if a has no 
leading zeros. Binary words without leading zeros are used as representations of natural 
numbers. The two procedures succ(a : b ) and predict : b) with input parameter a and 
output parameter b are intended to implement the arithmetical functions successor and 
predecessor on binary words, respectively. Finally, the statement to be proved, asserts that 
pred is the inverse operation of succ at least for inputs satisfying nlz. In the KTV system 
this proof obligation is expressed as a sequent of Dynamic Logic (DL, see Harel 1979; 
Heisel et al 1989): 

nlz(a ) b {succ(a : b ); pred(b : c))c = a (1) 

In a sequent F b A, T and A are lists of formulas. A sequent holds if the conjunction 
of the formulas of F implies the disjunction of the formulas of A. A formula (a)cp is true 
if the program a terminates and <p holds afterwards. Hence the above sequent can be read 
as: if the input a has no leading zeros, succ terminates for a and yields a result stored in 
b. Then pred terminates with input b and yields a result stored in c equal to a. 

The truth of (1) depends, of course, on the implementations of succ and pred (figure 1). 
The program succ is given in two versions: a faulty one on the left and a correct one on 
the right. In the erroneous version the case for topix ) = zero is missing. If we adopt the 
implementation of pred and the second version of succ, the proof obligation (1) can be 
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succ (x : y) succ (x : y) 

if x = zero then y := one else if x = zero then y := one else 
if x = one then y := so(one) else if x — one then y := so(one) else 
succ{pop{x)\y)\ y := so(y) fi fi if top(x) = zero then y := sfpopix)) else 

succ(pop(x):y); y := so00 fl fi fi 

pred (jc : y ) 

if x = zero Vjt = one then y zero else 
if* = so (one) then y := one else 
if top{x ) = one then y := sa(pop(x)) else 
pred(pop(x):y); y := s i (y) fi fi fi 

Figure 1. the programs: two versions of succ, and pred. 

proved. With the first version of succ, (1) is not provable. Let us compare the attempts to 
prove both versions of (1). 

2.2 The proofs for the two versions of succ 

Starting out from the original proof. obligation, a proof tree (like those in figure 2) is 
constructed in a goal-directed manner. Proof rules are applied, reducing a goal to sufficient 
subgoals which themselves are reduced and so forth. The original proof obligation is the 
root of the tree (conclusion), the yet unproved subgoals are its premises (light circles). A 
proof tree stands for the assertion that the conclusion holds if the premises do. A proof is 
completed if no premises remain. 

KIV provides a proof strategy for DL sequents. Applying it to (1) with the erroneous 
version of succ, we obtain the proof tree on the left side in figure 2. With regard to the 
intended reuse of this proof, we will call it the “old” proof and the other one the “new” 
proof. One open premise (59) remains: a f zero, a f one I- top(a) — one states that 
the last bit of every binary word, different from zero and one, is one. Since this is wrong, 
the proof attempt fails. 

Applying the proof strategy to (1) with the correct version of succ (the second one in 
figure 1) it yields the proof tree on the right side of figure 2. This proof is successful. 

The fragments A, B, C, D of the old proof correspond directly to the fragments A, B, C, 
D of the new proof. Although the corresponding sequents in the two proofs are different, 
the same rules can be applied in the same order within the fragments. However, there are 
intermediate proof fragments F and G in the new proof without counterpart in the old proof. 
They deal with the correction of succ, an additional conditional with a new then-branch. 
Although in the example the relative order of A, B, C, D is preserved, in general this is not 
the case. 

Furthermore, the fragment E of the old proof corresponds to the fragment E’ in the new 
proof. However, the rule applied in the new proof differs from the one applied in the old, 
because the fragment E’ is carried out in a different proof context. This change of the proof 
context is due to the intermediate fragment F in the new proof. The fragment H has no 
counterpart in the new proof. 

In total 95% of the old proof can be reused, which amounts to 85% of the new proof. 
Only 15% of the proof steps of the new proof are actually new. The example illustrates the 



232 


Wolfgang Reif and Kurt Stenzel 



potential that can be exploited with an automated technique for reuse of proofs. However, 
a careful analysis of the corrected program and the old proof is required to detect the 
reusable fragments, to determine the order in which to apply them, and to cope with the 
influence of modified proof contexts. Experiences confirm that the above example is not 
an exception but reflects a general phenomenon. 


3. An evolutionary approach to verification 

Before presenting the technique for reuse in § 4, we describe how it can be incorporated 
into a conventional verification process model. The result is an evolutionary approach to 
verification. It is called evolutionary because proof attempts often reveal errors instead of 
showing their absence, and the programs are subject to several modifications until correct 
versions are obtained. 

The evolutionary approach to verification is built on top of the conventional proof strat¬ 
egy used in § 2 to generate the proof on the left side in figure 2. It is called the basic 
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strategy and is based on symbolic execution of programs and induction. To some extent 
the approach is generic, and applies to other strategies as well, provided they meet two 
basic requirements. First, verification with a particular strategy must be “continuous” in 
the informal sense that small changes in the program text usually lead to small changes in 
the proof. Second, there must be a correspondence between positions in programs and po¬ 
sitions in proofs generated by the strategy. Whether the strategy fulfills these requirements, 
depends on the underlying proof rules. For the one adopted in this paper they are satisfied. 
Other examples are the methods due to Boyer & Moore (1979) and Burstall (1974). 

Verification with reuse of proofs proceeds as follows: The first proof attempt for a 
statement cp(prog) about a procedure prog with a body a is carried out with the basic 
strategy. If this attempt leads to a subgoal g which seems to be unprovable, the proof is 
interrupted. Let t be the proof constructed so far. If the user provides a ground substitution 
s for the free variables of g as a candidate for a counterexample for g, the system tries to 
verify that s is indeed a counterexample, hence that g is not provable. 

Example 1. In our example from § 2 prog is the procedure succ with the faulty imple¬ 
mentation (left hand side of figure 1), and cp(prog) is 

nlz(a ) I- ( succ{a : b ); pred(b : c))c = a (2) 

The proof attempt leads to a subgoal (No. 59 on the left side in Fig. 2) 

a ^ zero, a ^ one fr top(a) = one (3) 

The user guesses a = so(one) and KIV proves that so(one) is indeed a counter example 
for the goal, i.e. that the negation of the goal is true for a = so(one): 

a = so(one) I— fa ^ zero A a one -» top(a) = one) (4) 

Now the goal g is known to be unprovable. What does this mean? In general, there are 
two possible reasons why a proof attempt may lead to an unprovable goal. Either a wrong 
decision was made during the proof (in this case the faulty proof decision must be found 
and withdrawn), or the original goal cp(prog) is not provable. To find out the accurate 
reason the system tries to construct a counterexample s' for cp(prog ) from s by inspecting 
t. Inspecting the proof tree basically means to collect and simplify all first order formulas 
on the relevant branch of the proof tree from g backwards to the conclusion <p(prog). 

Example 2. In our example, KTV computes the candidate counter example a = so(one). 
KIV proves automatically that a = so (one) is indeed a counter example for (2). Inspection 
of the counter example proof reveals that succ(so(one) : b) produces the output b = 
so(so(one)), i.e. the successor of two is four. Consequently, an error is located in the 
implementation of the procedure succ. 

Now the original goal <p(prog) is known to be unprovable because of an error in the 
procedure body a of prog. Therefore, the user must provide a corrected version of prog 
with body ft instead of a, and <p(prog) has to be proved again. 

In order to enable the reuse of t in the new proof, the system computes a presentation 
of p as a combination of fragments of a and new fragments. Then t is analysed and a 
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correspondence is set up between the fragments of a and proof fragments of t. Finally, the 
new proof attempt is guided by these proof fragments. The reuse of proofs is explained 
more precisely in the next section. 

The quality of the technique depends on the quality of the first proof attempt. If the 
proof idea was correct, and the proof failed due to errors in the implementation, then reuse 
usually yields good results. If the first attempt followed a wrong idea, reuse is unlikely to 
succeed in a second attempt. 


4. The technique for reuse 

4.1 Presentation of program corrections 

We represent program corrections by presenting the corrected program as a combination 
ol fragments of the old program and new fragments. Therefore, we need the formal no¬ 
tions (program) skeletons and (program) fragments. Roughly skeletons are programs with 
"holes”, in which other skeletons may be inserted. A fragment of a program is a skeleton 
together with the position ( occurrence ), where in the program it occurs. If a fragment is a 
common part of several programs, the skeleton is associated with several occurrences. In 
the following definitions we view programs as terms and use some notation from Huet & 
Oppen(1980). 

DEFINITION 1 

Skeletons, occurrences. 


Skeletons are the smallest set containing □ (the “hole”), the statements skip, abort, 
assignment x .= r, procedure call /(a : y), and which is closed under conditional (if 
f C ? Se definition °f local variables (var x = r in a) (where a is the scope of 
tue definition of x with initialization r) and compound (or; p). 

A ,mgram is a skeleton without □. We call a skeleton a connected iff for each com¬ 
pound •/; 6 occurring in or, y is a program. 

sequences^meTlT^ 0 /' ^° CCUrrences °^ a ’ P e 0( a ) a position in or. Here, positions are 
p ,f or r,'p e ^^ e * et ° n at P° s ^ 0n P tn a, a[p y] the replacement in a with 

P* # Dandap ° Siti0n 

of u PIp Y]-< is the reflexive, transitive closure 

Sluing a'^ubs'kdetoTbv^ 1 " iS m ° re Concrete 111311 ft P can be derived from a by 

h > »skeleton (this replacemenrctmte^^’H 1 Can ^ derived from & b y re P lacin g □ 
call p a pattern for a. V1CWed 855 311 instanti ation). If a <0 holds we 

v ’ 5 of vequeih.es. pS = ln a o* e C0 ”^ tenatl0ri of P and a. For a sequence p and 
™ q 6 5 >-denotes the empty sequence. 
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Example 3. An example for a skeleton a is 

if x = zero then y := one else 
if x = one then y := so(one ) else □ 

This skeleton is derived from the procedure succ (figure 1) by replacing the else-part 
of the second conditional by □. The set of occurrences 0(a) contains 

- (), a /() is a itself, 

- (1), a/(1) is the then-branch of the first conditional y one, 

- (2), a/(2 ) is the else-branch of the first conditional, i.e. the second conditional, 

- (2,1), a/ (2,1) is the then-branch of the second conditional y := so (one), 

- (2,2), a/ (2,2) is the else-branch of the second conditional □. 

The replacement in a with □ at position (2), a [(2) □], yields 

if x = zero then y := one else □ 

a is smaller than a[(2) <- □] with respect to -<], since a can be constructed from 
a[(2) ■*- □] by replacing □ with the second conditional of a. 

DEFINITION 2 

Program fragments. Let y , ,..., /3 n be skeletons, p \,..., p n positions, (y, p \,..., p n ) 
is a program fragment of (fi\, ... , fi n ) iff pi e O (fit ) and y is a pattern for fit/pi for 
i = 1... n. The fragment is called connected iff its skeleton y is connected. 


In a program fragment (y, p \,..., p n ), y is a pattern for a subskeleton of each fit at the 
position pi, i.e. it is possible to instantiate y (with different skeletons for each i ) to y; such 
that y,- = fii / pi. Our aim is to compare an incorrect “old” program a with a correct “new” 
program ft. Therefore we will consider only fragments (y, p\, pi) of (fi, a) and (y, p\) of 
03), and call them old and new fragments, respectively. Basically, an old fragment describes 
a part of a program that can be found in both the old and the new program. The demand 
for connected skeletons is important in the analysis of the old proof (§ 4.2). Now we are 
prepared to define the representation of program corrections by presenting the corrected 
program as a sequence of old and new fragments. 


DEFINITION 3 

Presentation of ft using a. Let ft and a be skeletons, P = (/),..., /„) a sequence with 
fi a connected fragment of (fi, a) or a fragment of (ft). P is a presentation of fi using a 
iff □ • P is defined and yields fi: 


y • P = 


y if P = 0 

YlPi <- 8]*P'ifP = (8,pu...,p n )P',y/pi = n,ne{l,2} 
undefined otherwise 


The function • tells us how to construct the new program fi from the sequence of proof 
fragments: Begin with □ and replace □ by the skeleton y\ of the first fragment, then replace 
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one occurrence of □ in y\ by the skeleton of the second fragment, and so forth. If • is 
defined, then each position in a fragment corresponds to a □ during the construction, and, 
if f is a program, eventually all occurrences of □ are eliminated. Only the first position of 
each fragment (denoting a position in the new program) is used in this process, the second 
position (denoting a position in the old program if it exists) is only used in the analysis of 
the old proof (§ 4.2). 

Example 4. The following is a presentation P of the body of the correct procedure succ 
(from Fig. 1) using the erroneous version. 

/ /if x = zero then y := one else \ \ 

\ if x = one then y := so(one ) else □’ ' ) 

~ (if top(x) = zero then y ^ (popix )) else □, (2, 2)) 

V ( succ(pop(x):y ); y := s 0 (y), (2,2,2), (2, 2)) j 

The first and third fragments are fragments of both versions of the body, hence are old 
fragments, the second one is a fragment of the corrected body only, hence a new fragment. 
The new program can be constructed by replacing □ in the first skeleton by the second 
skeleton, and then by replacing □ from the second skeleton by the third skeleton. 

Every program fj can be trivially presented using a by P = ((f), ())). However, our 
intention is to use as many fragments of the old program a as possible. Therefore we need 
the notion of optimal presentations. 

DEFINITION 4 

Optimal presentation. A presentation P of f) using a is called optimal iff P fulfills the 
following three conditions: 

1. It is not possible to extract from a new fragment an old fragment, i.e. to find a pattern 
for a subskeleton of the new fragment that is also a pattern for a subskeleton of the old 
program. Formally: 

There are no new fragment (y, p) e P, positions q e 0(y) and q' e 0(a) and 
connected skeleton 8 ^ □ such that 8 is a pattern for y /q and a/q'. 

Since 5 is a pattern for a part of the old program, 8 should belong to an old fragment, 
even if the number of fragments is increased. 

2. It is not possible to merge two new fragments into one. Formally: 

There are no fragments (yi, pi), (y 2 , pf) € P and position pi e O(yi) such that 
PlP3 = P2- 

If P is a presentation and pipi, = p 2 , then yi/p 3 = □ holds, and it is possible 
to instantiate y\ at p 3 with y 2 and to discard the second fragment (y 2 , pf), thereby 
reducing the number of fragments. 

3. It is not possible to merge two old fragments into one. Formally: 

There are no fragments (yi, p\,q{), (y 2 , pi, qi) e P, positions p 3 e O(yi) and 
43 € 0(a) such that pips = p 2 and yi [p 3 y 2 ] is a pattern for a/qj. 
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The definition is slightly more complex than for new fragments, since it is possible 
that one old fragment is instantiated with another old fragment when constructing the 
new program, but the second fragment stems from another position in the old program. 
However, if the skeleton of the combined fragment is also a pattern for part of the old 
program, both old fragments can be replaced by the combined fragment. 

Condition 1 is the most important one. It guarantees that the old program is reused as 
much as possible. The two other conditions just reduce the number of fragments. Example 
4 shows an optimal presentation that is also unique. However, this is not always the case, 
since a program may contain one skeleton at different positions, which leads to different 
optimal presentations. 

The following theorem states that any program correction can be expressed as an optimal 
presentation. 

Theorem 1. For two programs a and ft there exists an optimal presentation off using a. 

Proof sketch. For two programs a and f there always exists a presentation of ft using a — 
the trivial presentation ((j8, ())). Starting from this presentation it is possible to construct an 
optimal presentation. Assume that one fragment violates condition 1. Then this fragment 
can be replaced by one or more other fragments that do not violate 1., such that the new 
sequence of fragments still is a presentation of f using a . If condition 2. or 3. is violated by 
two fragments, they can be replaced by one fragment (see Def. 4.2 and 4.3). By iteration 
we get an optimal presentation. 

4.2 Analysis of the old proof 

The next step is the analysis of the old proof with a given presentation P of the new program 
f using a. Our aim is to identify, for each program fragment f of f and a, corresponding 
fragments of the old proof. 

In the first phase of the analysis of the old proof we assign to a goal g a position q , 
if the goal contains the subskeleton ot/q of the old program a , no position otherwise. 
This assignment can be computed since each rule of the calculus dealing with programs 
is extended by a description how programs and occurrences are modified. 

Here are some example rules: the symbolic execution rules conditional and assign. The 
rules are reduction rules and have to be read bottom up: 

T, e b (a\)cp, A r, ->e h (a 2 )(p, A T h <pl, A , 

--±- x ' new 

F b (if e then a\ else o? 2 }<p, A T b (x := r }cp, A 

The first rule ( conditional ) has two premises, one for a positive test e and one for a 
negative test -> e . This rule is the proof theoretical counterpart to a symbolic execution 
of a conditional. If q is the assigned position of the conclusion, then a/q = if s then 
ct\ else <* 2 . Since a\ is the next statement in the first premise, we assign to this goal the 
new position #(1), and to the second premise q{ 2). The second rule corresponds to the 
symbolic execution of the assignment x := r. Since the application of this rule discards 
the statement, no position is assigned to the new goal. In both cases, the conclusion of the 
rule corresponds to exactly one statement of a program. 
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1-4. * 

5. if x = zero then y := one else 

if x = one then y := so(one) else 
succ(pop(x):y ); y := s 0 (y) 

6. y := one 

7-11. * 

12. if x — one then y := so (one) else 

succ(pop(x):y); y so00 

13. v := 


14-19. 



20 . 

21 . 

22-... 


etc. 


* 

succ(pop(x):y ); y := so O') 

y := so(y) 

* 


\ if x = zero then y := one else 
^ if x = one then y := so(one) else 


succ(pop(x):y)\ y := so O’) 


Figure 3. The analysed proof. 


In the next phase we identify the set of proof fragments T corresponding to program 
fragments. For each fragment / = (y, p, q) in P and each goal g with an assigned position 
q, the corresponding proof fragment is computed recursively over the subtree beginning 
with g. A proof step with conclusion g' belongs to the proof fragment for / if the position 
q' assigned to g' belongs to qO(y ). The process ends if q’ g qO(y ) or no position is 
assigned to g'. This approach guarantees that the resulting proof fragment forms a proof 
tree in itself. The demand for connected fragments (in Def. 3) assures that every goal g' 
with an assigned position q' e qO (y) is member of one q e T. Result of the analysis are 
T and the set T of proof fragments where a does not appear in any goal. 

Example 5. We demonstrate the analysis for the example presented in § 2 (figure 3). On 
the left side the lower half of the proof is shown. On the upper right side the corresponding 
subskeletons a/q for each goal are shown (following the numbering of goals in the proof). 
* marks goals with no assigned position. The lower right side shows the correspondence 
between proof fragments and program fragments. 

4.3 The new proof 

The last step in the reuse of proofs is the proof of the original goal with the corrected 
program fi instead of a. Since this is the only difference between the old and the new goal 
the proof is likely to differ only in those parts where the corrected program is involved 
somehow. In general, however, f also introduces new fragments without any counterpart 
in the old proof. Then, the basic proof strategy with all its heuristics is invoked. This means 
that the new proof is done partly by the basic proof strategy and partly by using fragments 
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the original proof the new proof 

Figure 4. The new proof. 


of the old proof. The procedure starts out with the initial goal as the current goal and 
proceeds recursively according to the following decision table: 

- If the corrected program does not occur in the current goal of the new proof, a proof 
fragment from T' is selected and reused. 

- If the current goal refers to an old program fragment (y, p,q) of f and a, a corre¬ 
sponding proof fragment from T for (y, p, q) is selected and reused. 

- If the goal refers to a new program fragment (y, p), the basic proof strategy is invoked. 

Reusing a proof fragment from T or T' basically means to copy each rule application 
of this fragment and to check whether it is actually applicable. Otherwise it is checked 
whether this proof step can be replaced by a different one without giving up the rest of 
the fragment. This flexibility is achieved by a number of elaborate heuristics. Heuristics 
are also involved in the selection of suitable proof fragments from T or T' . It may happen 
that the program fragment referred to in the current goal is associated with more than one 
proof fragment in T, or there may be several possible selections for proof fragments in T'. 
If no further reuse is possible for a goal the basic strategy is invoked. 

Example 6. The procedure is explainedby the example. Goals are numbered in both proofs. 
To distinguish between old goals and new goals, numbers for old goals are indexed with 
“o”, numbers for new goals with “n”. 

The reuse strategy starts with goal l n of the new proof and selects the old proof fragment 
l„-5 0 , yielding l„-5 n . Goal 5 n refers to an old program fragment / = (y, p, q) in the 
presentation of f. The corresponding proof fragment for / begins at 5 0 . Application of 
the rules of this fragment leads to a new proof with premises 7„, 14 n , 20 n . Goal 20„ refers 
to a new program fragment (y', p') in the presentation of f. This means that no reuse is 
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possible. Therefore the reuse strategy calls the basic proof strategy, which performs the 
proofs steps 20„-27„. The steps 7„-l 1„, 14„-19„ result from reusing fragments of T'. 

At 28„ the old program fragment 20 o -21 o is reused, even though the proof context has 
changed: top (a) 7 = zero is an additional precondition. This modification avoids the dead 
end of the old proof, but has no other influence on the proof. The remaining details are 
omitted here. The resulting tree is identical to the new proof in figure 2. 

5. Results 

Now we report on our experiences with the reuse strategy implemented in the KIV system. 
First of all, the algorithms for proof analysis and the strategy for reuse are efficient enough 
to be integrated into KIV’s tactic collection for interactive theorem proving. The analysis 
algorithm typically takes only one to ten percent of the time needed to carry out the new 
proof. The reuse strategy is faster than the basic proof strategy (even if there are no inter¬ 
actions) since there is less proof search in the reuse strategy. To illustrate its applicability 
we present a real example in the sense that the errors have not been introduced a posteriori, 
but have been discovered during the verification of a module for binary' arithmetic. 

The procedure di vmod (figure 5) computes the quotient and the remainder in binary 
arithmetic. It is recursive and uses two other procedures le (less or equal) and sub (sub¬ 
traction). This procedure contains three errors, marked with 1 to 3 . The first (*) can be 
considered as a typographical error and is corrected by replacing a variable, the second ( 2 ) 
is corrected by replacing a part of the program and the third ( 3 ) is corrected by deleting one 
statement and changing the structure of the program. Especially the last correction shows 
that a strategy for reuse of proofs must be able to cope with complex program corrections. 
We show how these errors have been found and corrected during the attempt to prove 
four properties of this procedure. They are named divmod-r (termination of divmod), 
mod-lemma (property of the remainder), div-lemma (property of the quotient), and mod-ls 
(remainder less than divisor). 

The verification protocol is given below. The interesting statistical results are the number 
of proof steps, interactions with the user and the total time needed for the proof. The basic 
proof strategy is partly interactive; e.g. 7 interactions for a proof with 169 proof steps 
means that 7 proof steps were applied by the user, and that the remaining 162 steps 
were performed automatically. We measure the degree of reusability by two numbers that 
indicate how many of the old proof steps are reused? and how many proof steps of the new 
proof stem from reuse?. These numbers will be given in the form Reuse: 97%: 95%. 

The verification starts by proving the first goal divmod-r using the basic strategy: 

1. proof of divmod-r: successful with 169 proof steps, 7 interactions, 6 min. (even though 
the procedure contains three errors, it still terminates) 

2. proof of mod-lemma: failure after 571 proof steps, 66 interactions, 3 h 45 min. This 
failure is a result of the first error (marked above by ! ). 

3. correction of the error. New proof of mod-lemma: 566 proof steps, 0 interactions, 
18 min. till the situation of 2. Reuse: 95%: 99% (without reuse, i.e. using the basic 
strategy to prove mod-lemma again till 2., 66 interactions and 3 h 45 min. are needed) 
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divmod (a, b : q,r) 
var e\ = tt in 
le(b, a: e\); 
if ^1 =#then q := 
it a = one then 


zero', r := a else 



q := one; r := zero 


q := z^o; r 

divmod(pop(a), b: q, r); 

if q = zero then q := one ; sub(a, b: r) else 
,- 


if r = zero then ; r := top(a) else 

if top(a) = zero then r := so( r ) else r := si(r) fi 
*~le(b, r;ei); 

if ei = tt then q := si(g); r) else q := so(<?) fi 

fif fi fi fi 


Figure 5. The procedure divmod. 

4. continue the proof of mod-lemma, mod-lemma and div-lemma can be proved. The 
remaining two errors become apparent during the proof of mod-ls. After their correction 
mod-ls is proved. 

Now the following situation is reached: All four proof obligations have been successfully 
proved. However, divmod has been modified three times and the proofs refer to different 
versions of divmod. Only the proof for mod-ls refers to the correct divmod. Therefore 
the first three goals divmod-r, mod-lemma and div-lemma have to be proved again. The 
reuse strategy is applied again: 

5. new proof of divmod-r: 220 proof steps, 3 interactions, 5 min Reuse: 116%: 89%. 116% 
result from the reuse of several proof steps twice, (without reuse 220 proof steps, 9 
interactions, 9 min) 

6. new proof of mod-lemma: 1389 proof steps, 24 interactions, 58 min Reuse: 96%: 76% 
(without reuse 1389 proof steps, 147 interactions, 6h 45 min) 

7. new proof of div-lemma: 140 proof steps, 0 interactions, 4 min Reuse: 98%: 96% 
(without reuse 140 proof steps, 42 interactions, 1 h 45 min) 

This concludes the proof of the four goals. Table 1 accumulates the results. If the 
procedure had been correct from the beginning, we would have obtained the figures for 
the correct program. In this case each goal would have been proved exactly once. Due to 
the errors it was necessary to repeat some proofs several times. This extra work is referred 
to as additional with reuse vs. additional without reuse in the table. 

The example shows that reusability of proofs is a significant phenomenon: An average 
of 94% of the old proofs have been reused, and 92% of the new proofs are reused proof 
steps. With the strategy for reuse the verification effort (the overall time including spec¬ 
ification, implementation, interactive and automatic proof) is only half the time needed 
without it. The additional verification effort due to error corrections is 11% compared to 
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Table 1. Comparison of the results with or without reuse. 



Time 

Interactions 

Steps 

Correct program 

9h 

227 

1994 

Additional with reuse 

lh 

9 

2192 

Additional without reuse 

12 h 

255 

2142 

Total with reuse 

lOh 

236 

4186 

Total without reuse 

21 h 

482 

4136 


the verification effort for a correct version. Without a strategy for reuse the additional ver¬ 
ification effort is approximately 130%. Furthermore, with the strategy for reuse the degree 
of automation improved significantly compared to a verification without it. The example 
demonstrates that the technique can handle complicated proofs and considerable program 
corrections. A number of case studies have confirmed these experiences. Reuse rates of 
more than 90% are typical, and the advantages of reuse grow with the size of the programs 
and proofs. 


6. Related work 

A certain amount of proof reuse capability is standard in most tactical theorem pro vers. 
Proof scripts and a replay mechanism can be found e.g. in NUPRL (Constable et al 1986), 
ISABELLE (Paulson 1994), HOL (Gordon 1988), PVS (Owre et al 1993) and others. 
The above reuse mechanism however, goes far beyond that. A machine learning approach 
to proof reuse is taken in Kolbe & Walther (1994). Example proofs are generalized and 
applied to similar goals by pattern matching. This approach aims at generalizing successful 
proofs but is not adequate to handle program corrections. 

Proof reuse in the context of program corrections can be seen as a special case of ana¬ 
logical reasoning in the sense of Owen (1990). Let P\ be a problem (goal) with the known 
solution Si (proof), and P% a new problem (corrected goal) with the yet unknown solution 
S 2 (proof). The problem to construct S 2 by analogy is divided into four subproblems: 

- base filtering. For P 2 find a similar problem (here Pi) which is already solved. 

- analogy matching. Find a mapping between Pi and P 2 . 

- plan construction. Get one or more candidate solutions for S 2 by transforming the 
known solution Si using the analogy match. 

- plan validation. Check whether the candidates for S 2 are indeed solutions for P 2 . 
Otherwise modify the candidates appropriately. 

In the context of program corrections the base filtering and the analogy matching prob¬ 
lems are void, since P 2 is constructed from P\. There is no search involved. The plan 
construction is realized by computing the presentation of the new program using the old 
one (§ 4.1) and analyzing the old proof (§ 4.2). Plan validation is realized by the construc¬ 
tion of the new proof (§ 4.3). 
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7. Conclusion 

We have presented an evolutionary approach to software verification based on reuse of 
proofs. It was shown how reuse of proofs can be integrated into a conventional verification 
process model. Proof attempts for erroneous programs are used to guide the verification 
of corrected versions. 

Program corrections are formally described by presenting the corrected version of a 
program as a combination of fragments of the “old” program and new fragments. This 
presentation can be computed in such a way that the original program is reused optimally. 

Based on the presentation of program corrections, the unsuccessful proof attempt is 
analysed and a correspondence is set up between old program fragments and corresponding 
fragments of the old proof. 

This analysis is used to guide the verification of the corrected program. Since more than 
90% of the old proofs can actually be reused, this approach saves a lot of proof search 
and user interaction. However, the quality of the technique depends on the quality of the 
proofs to be reused. 

The technique is completely automated, and tested with complicated examples. We have 
presented one of them. The results show that reusing proofs improves current verification 
technology significantly. 


This research was partly sponsored by the BMFT project KORSO. 
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Weak atomicity: A helpful notion in the construction of 
atomic shared variables 

K VIDYASANKAR 

Department of Computer Science, Memorial University of Newfoundland, 
St. John’s, Newfoundland, Canada A1C 5S7 

Abstract. A new class of 1-writer shared variables, called weakly atomic 
variables, is defined, and an elegant general method of constructing atomic 
variables from weakly atomic ones is presented in this paper. Four examples of 
atomic variable constructions that use this method are described. Two of these 
constructions are new. 

Weak atomicity provides an intermediate step between regularity and atom¬ 
icity. In addition to enabling new constructions, this concept helps to derive 
simple correctness proofs of the constructions. 

Keywords. Weakly atomic variables; atomic variable constructions; atomic 
shared variables. 

1. Introduction 

A shared variable is an abstraction of asynchronous interprocess (persistent) communica¬ 
tion, where the senders and the receivers are called the writers and the readers, respectively, 
and the states of the communication medium are the values of the shared variable. A writer 
writes, that is, puts a value in the variable, and a reader reads, that is, reports a value from 
the domain of the variable. Writing and reading are the only operations in a Read/Write 
variable (variable, for short, in this paper). In this paper, ‘Write’ and ‘Read’ are used as 
nouns, referring, respectively, to a write operation execution and a read operation execution, 
and ‘write’ and ‘read’ as verbs. 

Recent interest is on constructions of shared variables which have the following prop¬ 
erties: (i) the operation executions are not assumed to be atomic, that is, they are not 
instantaneous; and (ii) they are wait-free, that is, no operation execution waits for any 
other operation to finish its execution. The first property allows treating operation execu¬ 
tions uniformly, even when some operations are high level ones implemented using low 
level operations. The second property implies that each operation execution will take a) 
most a fixed amount of time, irrespective of the presence of other operation executions and 
their relative speeds, and the (harmless) failure of one operation execution does not affecl 
other operation executions. 


246 


K Vidyasankar 


The above two properties give rise to a classification of shared variables, depending on 
their output characteristics. Lamport (1986) defines three categories for 1-writer variables, 
using a precedence relation on operation executions defined as follows: for operation 
executions a and b, a precedes b, denoted a —> b, if a finishes before b starts; a and b 
overlap if neither a precedes b nor b precedes a. We note that a 1-writer variable is written 
by one and only one process, and not by many processes mutually exclusively. In 1-writer 
variables, all the Writes are totally ordered by “—(In this/paper, we are concerned 
only with 1-writer variables.) T 

DEFINITION 1 

(Awerbuch et al 1988) An'execution of a shared variable construction (called a run in 
Awerbuch et al 1988) is a tuple (A, — jr), where A is a set of operation executions 
(Reads and Writes) on the shared variable, —> is a precedence relation on A and n is a 
partial reading mapping from the set of Reads to the set of Writes in A, such that for Read 
r if n(r) = w then r returns the value written by w, and if n (r) is undefined then r may 
return any value from the domain of the variable. □ 

DEFINITION 2 

A Read r in an execution (A, —», n) is regular if 

(i) ;r(r) is defined, 

(ii) r ■/—> n(r) and 

(iii) for no Write w in A, n(r) —> w — r. □ 

DEFINITION 3 

(Awerbuch et al 1988) An execution (A, —>, n) on a shared variable is 

(a) safe if each Read in that execution that does not overlap with any Write is regular, 

(b) regular if each Read, whether it overlaps with a Write or not, is regular, and 

(c) atomic if it is regular and in addition there is a total order =>• on the set of operation 
executions as follows: 

(i) for a, b in A, if a —b then a ==£• b\ 

(ii) for each Read r in A, n (r) =>■ r; and 

(iii) for each read r in A there is no Write w in A such that jr (r) ==$■ w =>• r. □ 
DEFINITION 4 

(Awerbuch et al 1988; Lamport 1986) A shared variable is safe, regular or atomic if each 
execution on the variable is safe, regular or atomic, respectively. □ 

A shared variable is boolean or multivalued depending upon whether it can hold only 
boolean or any number of desired values. 

With these classifications we can define a hierarchy on shared variables, with 1-writer 
1-reader boolean safe variable in the lowest level and multiwriter multireader multivalued 
atomic variable in the highest level. Higher-level shared variables can be constructed from 
lower-level ones. Several such constructions have been proposed in the literature. 
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In this paper we consider wait-free construction of 1 -writer atomic variables from regular 
variables. We first state a proposition that describes the distinguishing property which 
makes a regular variable atomic. In 1-writer variables, the precedence relation —* induces 
a total order on the Writes. We use the notation -< for the total order. That is, for Writes 
a and b, a —» b if and only if a < b. We also use a < b to denote a equals b or 
a < b. 

PROPOSITION 1 

(Awerbuch et al 1988; Lamport 1986) A 1-writer multireader multivalued regular shared 
variable is atomic if it has the additional property that, in each execution on that variable, 
for any two Reads r and r' such that r —> r, n (r) < n(r'). □ 

In a regular variable, it is possible that for r, r' such that r — r', Jt(r') -c n{r). This 
situation is called new-old inversion. Proposition 1 identifies “no new-old inversion” as the 
single additional property that is required of a regular variable to become atomic. Different 
techniques have been employed to overcome new-old inversion in the literature, for ex¬ 
ample, in the constructions of Lamport (1986, Construction 5), Vidyasankar (1991), and 
Bums & Peterson (1988). There is no uniformity in the techniques, and each construction 
warrants a different, and sometimes involved, proof of correctness. 

In this paper, we first identify a general method of converting regular variable construc¬ 
tions to atomic ones. This method is conceptually simple, but is difficult to implement. 
We then show that for a new class of regular variables, called weakly atomic variables, 
the implementation of this method is easier. Now the construction of weakly atomic vari¬ 
ables from regular variables appears to be a much simpler problem. Some regular variable 
constructions that have appeared in the literature are also weakly atomic. Some others are 
convertible to weakly atomic ones in simple manner. Thus, with the identification of weak 
atomicity as a possible intermediate step between regularity and atomicity, the problem of 
constructing atomic variables from regular ones is simplified. 

This paper is organized as follows. The above mentioned general method of converting 
regular variable constructions to atomic ones, the weak atomicity concept, and a general 
method of converting weakly atomic variable constructions to atomic variable ones are 
described in § 2. We then take, in § 3, four weakly atomic variable constructions and 
derive atomic variable constructions from them using the general method. Two of these 
weakly atomic variable constructions have been obtained by slightly modifying the regular 
variable constructions of Lamport (1986, Construction 4), Chaudhuri & Welch (1990). 
The resulting atomic variable constructions are new. The other two weakly atomic vari¬ 
able constructions have been obtained the other way, from atomic variable constructions 
in the literature. Here our intention is to illustrate that the weak atomicity concept can 
provide better insight into some existing atomic variable constructions, and also simplify 
the correctness proofs. Section 4 concludes the paper. 

2. Weak atomicity 

In an execution, we define a Read r to be dependable if for all Reads r' such that r —> r', 
n(r) < Tv(r'). In an execution on a regular variable, some Reads are dependable, some are 
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not. If all Reads, in all executions, are dependable then, by proposition 1, the variable is 
atomic. 

Let us consider the following method of converting regular variable constructions to 
atomic ones. Let V be a regular variable and read(V) and write(V) denote the Read and 
Write operations on V. We define a Read operation read*(P) on V as follows: 

It consists of a sequence of one or more read(V') executions and returns the value 
read in one of them such that read*(V) is dependable; that is, if R is a read*(V) 
execution, then for any other read*(V) execution R' such that R —* R', jt(R) < 
n(R'). 

This means that if R consists of read(V) executions r\,rj, ■ ■ ■ ,r n and returns the value 
read by r,- for some i, and R' consists of rj, r’ 2 , ■ ■ •, r’ m and returns the value read by rj for 
some j. then 7r(r, ) < jr(rj). 

There are two problems with this approach: (i) the number of times read(V ) is executed 
by a read*(V) execution must be bounded by a constant for the wait-freedom property, 
and (ii) the check for dependability must be feasible. Both these problems are difficult 
in general. In the following, we define a special class of regular variables, called weakly 
atomic ones, for which these problems appear to be easier to solve. 

First we introduce some additional terminology and state a basic property. For operation 
executions a and b on a shared variable, a b will mean that a starts before b finishes. 
That is, if a —► b, then either a precedes b or it overlaps b\ in other words, b -/-» a. We 
also assume that if b -/—¥ a , then a --- b. That is, we assume global time model (Lamport 
1986). 

PROPOSITION 2 

(Lamport 1986) For operation executions b and c on a shared variable, and any operation 
executions a and d, if a —» b c —j- d, then a —» d. 


Proof. The implication follows by the transitivity of (i) a finishes before b starts, (ii) b 
starts before c finishes and (iii) c finishes before d starts. □ 

PROPOSITION 3 

Consider an execution on a 1-writer multireader multivalued regular variable. Suppose r 
and r' are Reads such that r —> r' and it (r) < Then for any Read r" such that 

r' —► r", n (r) < Tc{r"). 

Proof. Suppose that n(r) f n(r"). Then, since all the Writes are totally ordered in the 
1-writer case, rr(r") -< 7t(r). Now, since the variable is regular, n(r') r\ by defini¬ 
tion 2(ii). Therefore we have n(r") —> n{r) —> rr(r') --► r' —^ r". This implies, 
by proposition 2, that Jt(r") —> jt(r) —^ r". This violates part (iii) of definition 2, 
contradicting the regularity of r". □ 

We note that, if for the Reads r, r' such that r —^ r in the above proposition, we 
only have 7r(r) = tt (r 7 ), then it is not necessarily true that for any r" that succeeds r', 
7t(r) < Tc{r"). (A counterexample is where r, r' and r" all overlap 7 t(r), and it(r") 
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immediately precedes n(r), that is, there is no other Write in between n{r") and n(r). 
Here it(r") is the most recently completed Write for r", and hence r" can return the n(r") 
value without violating the regularity property.) We call those regular variables for which 
this property holds weakly atomic. 

DEFINITION 5 

A1-writer regular shared variable is weakly atomic if the following property holds in each 
execution on that variable: if r and r' are Reads such that r —> r' and n(r) = n(r'), then 
for any r” such that r' —> r", n(r) < n(r"). □ 

Thus weakly atomic variables have the nice property that for a Read r in an execution, 
if for some Read r that succeeds r, n(r) < n (r'), then for all Reads r" that succeed 
r', n(r) < rt(r"). Therefore a read*(V) execution R could consist of a sequence of 
one or more read(V) executions r\, r%, • • •, r n such that, for some i, n(ri) < n (r„), 
returning the value read by r ; -. The following proposition implies that n needs to be at 
most 3. 

PROPOSITION 4 

Consider an execution on a 1-writer multireader multivalued regular variable. For any 
three Reads r, r' and r" such thatr —> r' —>■ r", if tv (r') f jr(r") then 7t(r) < rc(r'). 

Proof Assume the contrary. Then rc{r") < 7t(r') < it (r), that is, n(r") — Ti(r') —>- 
Jtir). Since n(r) --*■ r, we have 7 i(r") — n(r') —> n(r) --- r —^ r". It follows by 
proposition 2 that n(r") —► jr(r') —> r", contradicting the regularity of r". □ 

Therefore, in a weakly atomic variable, for any three Reads r, r' and r" such that 
r —> r' —> r", r' is dependable when Ji{r') < n(r”), and r is dependable when 
n(r') f n{r"). Hence the operation read*(V) can be described as follows. 

Function read*(V): 

r\ : val\ := read(V) ; 
r2 : val2 '■= read(V) ; 
ri : valj, := read(V) ; 

If rtirf) < ttirf) then return val 2 else return val\. 

We note that the relation < in the above function can be replaced by = also; this is 
justified by the following proposition. 

PROPOSITION 5 

Consider an execution on a 1-writer multireader multivalued regular variable. For any 
three Reads r, r' and r" such that r —> r' —> r", if Tc(r') ± n(r"), then either 
7t(r) < Jt(r') orn(r) < rt(r"). 

Proof. Suppose both rt{r') and 7 t(r") precede rr(r). Now either n{r') precedes 7 i(r") 
or vice versa. In the former case, we have Tc(r') — 7t{r") — n (r) r —> r'. 

This implies n(r') —> Jt(r") — r', contradicting the regularity of r' (part (iii) of 
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definition 2). In the latter case, we have n(r") —> it(r') —» n(r) r —» r". This 
implies jr(r") —* n(r') —> r", contradicting the regularity of r". □ 

The above description of read*( V ) is a schematic one - proper code must be substituted 
for checking nfa) ■< irfa) (or 7r(r2) = it fa)) to get a complete function definition. The 
method of checking the condition depends on the variable V. Also, it may be possible to 
do the checking without actually performing rj,. Then, the third Read can be eliminated 
in the read*(V r ) execution. Further optimization is also possible in some cases. These are 
illustrated in the constructions in the next section. 

DEFINITION 6 

For a read*(V) operation execution R returning it fa) value, n(R) is defined to be 

*fa). □ 


3 . Atomic variable constructions 

In this section, we present four atomic variable constructions derived from weakly atomic 
variable constructions using the general procedure described in § 2. The read*( V) descrip¬ 
tion involves specifying how the condition rrfa) < it fa) orjrfa) = Trfa) is checked in 
the if statement. The write procedure and the read and read* functions are written in a 
Pascal-type language. The blocks are shown by indentation, rather than with ‘begin’s and 
‘end’s. 

In the proofs we use the following notation. For an operation execution a on the variable 
V , a\x] denotes the execution of the suboperation of a on the (sub)variable x; the argument 
is expanded as [x — value ] to indicate the value that is read or written. For an atomic 
variable y, the total ordering imposed on the operation executions on y will be denoted 
=fa; the subscript y will be omitted if it is clear from the context. For two operation 
executions a and b on y, we recall from definition 3 that if a —> b then a =$■ b. The 
property that if a =$■ b then a — ► b is easy to verify. 

3.1 Lamport’s Construction (Lamport 1986) (with atomic bits) 

The construction of Lamport [(Lamport 1986): Construction 4] implements 1-writer mul¬ 
tireader multivalued regular variable from 1-writer multireader boolean regular variables. 
Lamport has shown in the same paper that even if the boolean variables (bits) are atomic, 
that construction does not implement an atomic variable. We show that when the bits 
are atomic, the same construction implements a weakly atomic variable. Then we define 
read*(F) to get an atomic variable construction. 

The bits are v\, V 2 , ■ ■ ■ , v n . The construction uses unary encoding in which a value k 
is denoted by zeroes in bits 1 through k - 1 and a one in bit k. To write the value k, the 
writer first sets bit k to one and then sets bits k — 1 through 1 to zero, writing from right 
to left. A reader reads the bits from left to right (1 to n) until it finds a one. The write and 
read operations for V and the read operation read*(F) are as follows. We assume that the 
initial values of all the variables are zeroes, and that the first operation execution on V is 
an initializing Write that does not overlap with any Read. 
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Construction 1. 

Procedure write (V) writing value k: 
write 1 in u*; 

for i := k — 1 step —1 until 1 do write 0 
in Vi . 

Function read ( V ): 

for k := 1 until n do 
val read(i>&); 

if val = 1 then return k and exit. 

(* 7t(r) is w , where 7r(r[u&]) = w[vjc] *) 

Function read*(V): 

r\ : k\:= read(V) ; 
r 2 : k 2 := read(V) ; 
r 3 : ky.= read(V0 ; 

if k 2 < k 2 then return k 2 else return k \. 

The regularity of V, even when the bits are regular, has been shown in Lamport (1986). 
In the following, we show the regularity of V (with atomic bits) to make this subsection 
self-contained, and then show the weak atomicity of V and the dependability of read*(V) 
operations. 

Lemma 1. Suppose r is a Read of V returning value k and jc(r) is w. Then there is no 
Write w' such that for some /, l < k f w[vi] =>■ w r [vi] =$ r[u/]. 

Proof Suppose on the contrary that the statement of the lemma does not hold. Let w' be the 
latest Write succeeding w , and / the largest index such that w[vi) => vo'[vi\ ==> r[u/]. 
For the case l = k, we get a contradiction to jt (r[u^]) = w[vk]. For / < k and w r 
writing 1 in u/, our choice of w r implies that r should read 1 from u/, contradicting the 
assumption that it reads 0. In the remaining case, for l < k and w' writing 0 in u/, we 
have w'[vi+i\ —> w'[v{\ =» r[v;] —> r[u/ +1 ], implying w[vi+ 1 ] ==» w'[vi+\] ==> 
r[v/+i], which contradicts the choice of l. □ 

Lemma 2. The variable V is regular. 

Proof. Let r be a Read in an execution of V. We show that all the three properties of 
definition 2 are satisfied for r. Suppose r reads 0 from i>i,..., Vk-i, and 1 from v^. It 
is a simple exercise to show that there is such a k. Then rr(r) is defined as the Write w 
such that n(r[vk]) = w[vk\- From the atomicity, and hence the regularity, of Vk, we have 
r[vk ] w[vk\. Hence it follows that r -f-± w. That there is no Write w' such that 
w —> w' —> r follows from lemma 1. □ 

Lemma 3. For read( V) executions r a and ri, such that r a —> rj,, reading values k and k' 
respectively, ifk < k! then 7t(r a ) < n (r^). 




252 


K Vidyasankar 


Proof. Suppose that k < k', but n(jb) < rc(r a ). Denote 7t(r a ) as u> a , and n(rt,) as u 
Then we have u>b[vk\ =>• w a [vk] => r a [vC\ =>■ rfc[v&], which contradicts lemma 
-(with rt, as r). 

Lemma 4. The variable V is weakly atomic. 

Proof. Suppose for read(V) operation executions r and r' such that r —-> r\ n(r) 
n(r'). We show that for any r" such that r' —> r", n(r) < 7t(r"). Denote n(r) as w ai 
n(r") as w”. Let k and k" be the values written by w and w", respectively. If k < l 
then by lemma 3, n(r) < n{r"). So assume k" < k. Suppose Jt{r") -< n(r). Th 
w" —> w[vk\ => r[vk] —> r' implies w" —> r' (by the property that if a => 
then a b, and by proposition 2). Then = 1] => r'[v^ = 0] implies tl 

there exists some w 1 " such that w"[v[ c >>] =4 w" l [vyi = 0] ==» r'[u^»]. This impli 
w"[vk»] =>• w"'[vi c ii] => r"[ufc»], contradicting lemma 1 (with r" as r). 

Theorem 1. Construction 1, with write(V’) and read*(V), implements an atomic va 
able. 

Proof The dependability of read*( V) follows from the weak atomicity of V, by lemma 
and (i) from lemma 3, when ki is returned, and (ii) from proposition 5 and the fact thal 
ki ^ ki then n{rf) f 1 when k\ is returned. 

3.2 Tree Construction (Chaudhuri & Welch 1990) (with atomic bits) 

The construction of Chaudhuri & Welch (1990) implements 1 -writer multireader multiv 
ued regular variable from 1 -writer multireader boolean regular variables, in a way diffen 
from that of Lamport’s in the above subsection. We show that, here also, when the book 
variables are atomic, the same construction implements a weakly atomic variable. Th 
we define read*( V) to get an atomic variable construction. 

The shared variables (bits) are the internal nodes of a binary tree, whose leaves cor 
spond to the values being written. The tree represents a sort of binary search conducted 
the Reads to find the value written. The Reads take a path from the root to a leaf, whert 
the Writes follow the path starting from a leaf to the root. The path in the tree taken b; 
Read, along with the values read from the internal nodes, uniquely defines the value re 
from the tree. In the following, we describe these operations in more detail. 

A Write, writing a value i\ writes into the set of variables which form the path betwe 
the root and the leaf labelled v, as follows: 

• The first internal node written is the parent of the leaf labelled v. If the leaf node is 
left/right child, the value written is 0/1. 

• The ith internal node written is the parent of the (i — l)st node. If the (i — l)st nc 
is the left/right child, the value written is 0/1. 

• The last node written is the root. 

A Read reads the set of variables which form the path from the root to a leaf labellec 
for some v. It subsequently returns v. 
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• The root node is the first node read. 

• Suppose the ith node read has value 0/1. Then, if its left/right child is a leaf, then the 
value v, where v is the label of the leaf, is returned. Otherwise, the left/right child of 
the ith node is the (i + l)st node read. 

For the case where the variables and values form a complete binary tree, we describe 
the algorithm formally. Let v m v m -\ • • • v\ be the binary representation of the n-ary value 
v, where m = log n and n is a power of 2. The root variable is labelled e . For each variable 
labelled with the binary string l, the strings 10 and /1 are the labels of its left and right 
children, respectively. We assume that the initial values of all the variables are zeroes, and 
that the first operation execution on V is an initializing Write that does not overlap with 
any Read. The operations are as follows. (The notation v m • • • v m refers to the variable v m 
and v m • • • v m +i refers to the root variable e.) 

Construction 2. 

Procedure write(V) writing value v = v m v m -\ ■ ■ ■ vj: 

for p := 1 to m do 

write v p in variable v m ■ • • v p +\. 

Function read(F): 

for p :=m step —1 until 1 do 
v p := read variable v m ■ ■ ■ t > p +1 
return v m ■ ■ ■ v\. 

(* 7 r(r) is in, where n(r[v m ■ ■ ■ V 2 ]) = w[v m ■ ■ ■ U 2 ]*) 

Function read*(V): 

r\ : k\ := read(V) ; 
r 2 : k -2 := read(V); 
r 3 : *3 := read(V) ; 

if k 2 = k^ then return k 2 else return k\. 

The regularity of V has been shown in Chaudhuri & Welch (1990) with regular bits. In 
the following, we show the regularity with atomic bits. 

Lemma 5. Suppose r is a Read of V returning value k and n(r) is w. Then there is no 
Write w' such that for some node x, in the path from the root to the leaf representing k, 
w[x] =$■ u/[x] =>■ r[x]. 

Proof. Suppose on the contrary that the statement of the lemma does not hold. Let vj' 
be the latest Write succeeding w, and x be the farthest node from the root such that 
u)[x] => vj'[x] => r [ a;] . If x is the parent of the leaf representing k, we get a contradiction 
to n(r[x]) = iu[x]. In the remaining case, assume without loss of generality that w writes 
0 in x. If w' writes 1 in x, our choice of w' implies r should read 1 from x and go to a 
leaf different from the one representing k, a contradiction. If w' writes 0 in x, we have 
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u/[xO] —> w'[x] ==$■ r[x] —> r[x 0 ], implying u>[x 0 ] w f [x0 ] r[x 0 ], which 

contradicts the choice of x . □ 

Lemma 6. 77ze variable V is regular. 

Proof. Let r be a Read in an execution of V. It is straightforward to verify the first two 
properties of definition 2 for r. The third property follows from lemma 5. □ 

Lemma 7. For read(V) executions r a and 77 , such that r a —> rt>, reading values k and k! 
respectively, ifk = k f then n(r a ) < 

Proof Suppose k = k\ but ir(rb) < Jt(r a ). Denote it(r a ) as w a , and jr(rb) as Wb. Then, 
if the parent of the leaf representing value k is the node x, we have u^[x] ==> w a [x] 
r a [x] =$■ rb[x ], which contradicts lemma 5 (with as r). □ 

Lemma 8. The variable V is weakly atomic. 

Proof. Suppose for read(V) executions r and r f such that r —^ r\ jr(r) = ir(r'). We 
show that for any r" such that r' —> r n , rc(r) < n (r n ). Denote ir (r) as w and n(r' f ) 
as w n . Let k and k" be the values written by w and w", respectively. If k = k", then by 
lemma 7, 7 i(r) < n(r"). Assume that k ; ' is different from k. Suppose jr(r") < n(r). 

Let y be the parent of the leaf representing k. Then, from w" —> zu[y] r[y] —^ r\ 
we have w" —> r f . Now let x be the boolean variable farthest from the root common to the 
paths from the root to the leaves representing k and k". Without loss of generality, assume 
that k is in the left subtree rooted at x and k" is in the right subtree. Then w"[x = 1] =$ 
r'[x = 0] implies there exists some w m such that w"[x] => w /f/ [x = 0] => r'[x]. This 
implies w f, [x] =» w f/f [x] => r"[x], contradicting lemma 5 (with r" as r). □ 

Theorem 2. Construction 2, with write(V) and read*(V) , implements an atomic vari¬ 
able. 

Proof The dependability of read*( V ) follows from the weak atomicity of V, by lemma 8, 
and (i) from lemma 7, when ki is returned, and (ii) from proposition 5 and the fact that if 
k 2 # £3 then 7r(r2) ^ Tt(rf), when k\ is returned. □ 

3.3 Alternating Write construction (Vidyasankar 1989) 

With write(V) and read*(V), this is an atomic variable construction from Vidyasankar 
(1989). The underlying weakly atomic variable construction is brought out to provide 
more insight into the atomic variable construction. Here read*(V) is an optimized ver¬ 
sion. 

Here all the variables are 1-writer multireader ones. The multivalued variable V is 
composed of a boolean atomic variable c and two multivalued regular variables Bo and B\, 
called buffers , which are used alternately for writing successive new values. Immediately 
after Bj is written, c is set to i. (We denote the boolean values as 0 and 1.) Thus c contains 
the index of the buffer Bf which was written by the most recent Write. 
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The writer uses a local (not shared) boolean variable cl. It is assumed that the initial 
value of cl is either 0 or 1, and the first operation execution on V is an initializing Write 
that does not overlap with any Read. 

Construction 3. 

Procedure write(V) writing value newval : 

cl := —«c/; 
write newval in B c i\ 
write cl in c. 

Function read(V): 
read k from c\ 

read and return val from B^. 

Function read*(V): 
read k\ from c ; 
read val 1 from Bk x ; 
read &2 from c; 
if &2 = k\ then return val 1 
else v 

read val2 from Bk 2 \ 
read k 3 from c; 
if £3 = k 2 then return val2 
else (* £3 = k\ *) return val 1 . 

A direct proof of construction 3, with write(V) and read*( V), is given in Vidyasankar 
(1989). Here we give a proof using the weak atomicity concept. 

We note that in this construction, for a Read r that reads and returns the value from 5/, 
it (r) = w , where 7 r(r[#;]) = w[Bi]. 

Lemma 9. For a Read r in an execution on V, Jt(r[c]) < ix(r). 

Proof. Assume without loss of generality that r reads 0 from c. Denoting 7x(r[c]) as w c , 
we have u; c [jBo] —> w c [c] => r[c] —> r[5o]» implying w c [Bq] —> From the 

regularity of Bq, it follows that w c < 7t(r). □ 

Lemma 10. The variable V is regular. 

Proof. Let r be a Read in an execution on V. The first two properties of definition 2 
follow from the regularity of the buffers. For the third property, suppose there is a Write 
w f such that 7t(r) —> w' —> r. Denoting tx (r) as w , we have w[c] —^ w f [c] —► r[c], 
implying w -< jr(r[c]), that is, tx( r) < 7 r(r[c]), contradicting lemma 9. The assertion 
follows. □ 

Lemma 11. For read(V) executions r and r f such that r —► r f reading values k and k!\ 
respectively, from c , ifk = k l then Tt(r) < Tt{r f ). 


256 


K Vidyasankar 


Proof. Suppose on the contrary that x(r') < n (r). Denote x(r) as w and rc{r') as \ 
Assume without loss of generality that k = k' = 0, that is, both w and w' write in buf 
Bo and write 0 in c. Since no two consecutive Writes write in the same buffer, there i 
Write w", such that w' —» w" —> w, writing in B\ and writing 1 in c. Then we h; 
w'[c ] — > w r '[c] — > w[Bq ] r[Bol — > r'[c], implying that w' -< 7r(r'[c]), that 
7 t(r') -< 7r(r'[c]), contradicting lemma 9 . 

Lemma 12. The variable V is weakly atomic. 

Proof. Suppose for read(V) executions r and r' such that r —> r', n(r) = n(r'). 
show that for any r" such that r' —> r", n(r) < x(r"). Denote nir) as w and x(, 
as w". Let k and k" be the values that r and r" read, respectively, from c. If k = 
then, by lemma 11, n(r) < n(r"). Suppose k and k" are different. Assume without 1 
of generality that k is 0 and k" is 1. 

First we claim that w[c] => r'[c\. For, suppose on the contrary that r'[c] ==> w\ 
Since r' reads 0 from c, w writes 0 in c, and two consecutive Writes write different val 
in c, it follows that r'[c ] w p \c\, where w p is the Write immediately preceding 
Then we have r[i?o] —> r'[c\ => w p [c] —> implying r[6ol —> u>[^o]. T 

contradicts the regularity of Bo- 

Since r' —> r", we have w[c ] => r'[c] —> r"[c\, that is, m[c] => r”[c\. Thai 
w\c = 0] r"[c = 1], implying that, if w s is the Write that immediately follows 
w[c = 0] —* u^c = 1] =>■ r"[c = 1]. That is, w < n{r''[c]). Therefore, if w" < 
then w" < re(r"[c)), contradicting lemma 9. 

The assertion follows. 

Theorem 3 . Construction 3 , with write(V) and read*( V), implements an atomic v. 
able. 

Proof. The dependability of read*(V) follows from lemmas 11 and 12 . 

3.4 An Optimal 1-Reader atomic variable construction 

Here, the resulting atomic variable construction, consisting of write(V) and read* 
operations, is a slight variation of that of Bums & Peterson (1988). However, the apprc 
of deriving the atomic variable, by identifying the weakly atomic variable V and u 
a general method to get read*(V) operation, is new. This approach facilitates a sic 
correctness proof. 

This is a 1-writer 1-reader atomic variable construction from regular variables. A i 
tivalued regular buffer b and two boolean regular variables RC and WC are used, 
operations are given in the following. The variables Ir, Iw, savebuff and savebuffl 
local variables. All the variables could have any initial values. We assume an initiali 
Write that precedes all other operation executions on V. Here also, the read*(V) i 
optimized version. 

Construction 4. 

Procedure write(V) writing value newval: 
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write newval in b; 
read lr from RC; 
write ->lr in WC. 

Function read(V): 

read Iw from WC; 

iilw = RC then return savebujf value 
else 

write Iw in RC; 
read savebujf from b; 
return savebujf value. 

Function read*(V): 

read Iw from WC; 

if Iw — RC then return savebujf value 
else 

write Iw in RC; 
read savebujf from b; 
read Iw from WC; 

if Iw = RC then return savebujf value 
else 

write Iw in RC; 
read save buffi from b; 
read Iw from WC; 

if lw = RC then savebujf := savebuffl; 
return savebujf value. 

Lemma 13. The variable V is regular. 

Proof. Consider a Read r in an execution of V. It may return a value either from buffer b, 
or from savebujf without reading b. Let r° be the most recent Read equal to or preceding 
r, which reads from b. Then n(r) is n(r°) which equals Tt(r°[b]), and this is defined by 
the regularity of b. If r —> jr (r), then r° — > n(r°), and hence r°\b] — > jt (r 0 [b\), 
a contradiction to the regularity of b. Therefore property (ii) of definition 2 is satisfied 
for r. For property (iii), suppose there is a Write w' such that ix{r) —> w' —> r. Now 
7t(r°) —» w' —> r° would imply Tt(r 0 [b]) —> w'[b] —> r°[b], contradicting the 
regularity of b. Thus property (iii) is satisfied for r°. Suppose r ^ r°. Then, from the 
above discussion, we have w'[b] -f—> r°[b\. That is, r°[b] —► w'[b]. Then r°[RC ] —> 
r°[b\ — w'[b] —^ w'[RC] —^ w'[WC] —> r[WC] implies r will find WC # RC. 
(Note that by our choice of r°, no Read in between r° and r writes RC since it does not 
read b.) Hence r will read b, contradicting the assumption that it returns savebujf value 
without reading b. □ 

Lemma 14. Consider three Reads r, r’ and r" such that r —> r' — > r". Ifr" reads from 
b, thennlr) < n(r"). 
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Proof. Suppose on the contrary that n(r") < 7 t(r). Denote n(r) as w and n(r") as 
w' J . Then w" —> w. We claim that there cannot be any other Write in between w 
and w". For, if there is one, say u/", then w" —> w'" —> w — ► r —» r" implies 
w" —* w”' —» r"; then the assumption that n{r") is w" contradicts the regularity of 
V. Also, again from the regularity of r", w r", that is, r" --- vj. In fact, we have 
r"[b] u>[fc], since r" reads b. From w" —> w r —> r', we have w" —> r'. 

Therefore, w"[b] —► w"[WC] —+ r'[WC ] —» r"[WC ] —> r"[b] — w[b] —► 
w[WC]. That is, w"[WC ] —» r'[WC] —> r"[WC] —* w[WC]. Thus both r' and r" 
read the same value of WC. Since r' would have made sure that RC = WC, r" would 
certainly find that RC = WC, and hence would not read the buffer b, contrary to the 
assumption. n 

Lemma 15. The variable V is weakly atomic. 

Proof. Consider three Reads r, r' andr" such that r — r' — > r" and 7r(r) = Tc(r'). Let 
r° be a Read such that (i) it is the most recent Read equal to or preceding r", (ii) it succeeds 
r' and (iii) it reads from b. From lemma 14, ix(r) < n(r°), and of course n{r°) = n(r"). 
If no such r° exists, then r" returns the same savebujf value as was returned by r'. That 
is, n(r) — ir(r"). d 

Theorem 4, Construction 4, with write(V) and read*(V) , implements an atomic vari¬ 
able. 

Proof. The assertion follows from lemma 15 and the observation that read*(10 specifi¬ 
cation is according to the general procedure, described in § 2, with some simplification. 
Nevertheless, we give a detailed proof showing that new-old inversion does not occur with 
read*(V) executions. 

Consider a read*(V) execution R. It starts with the first read(V) execution, say r. 
If it returns savebujf value (line 2 of the procedure), which is the value returned by the 
predecessor read*(V) execution, then clearly there is no new-old inversion. Suppose r 
is performed to completion and the next read(V) execution, say r', is started. If savebujf 
value is returned, we have n(r) = Jt(r') and the weak atomicity (lemma 15) justifies the 
return of this value. In the remaining case, r' is performed to completion, saving the value 
read from b in savebufjl, and a thi r d read(V) execution, say r" , is started. If r" were to 
return savebujf 1 value, then again due to weak atomicity, R can return savebujf 1 value. 
If r" were to read from b, then R returning the value read by r is justified by lemmas 14 
and 15. (Hence r" need not read b at all, since that value is not going to be used in 
any way.) □ 

The optimality of the atomic variable construction, with respect to the shared space 
requirement, has been shown in Bums & Peterson (1988). 
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4. Discussion 

The weak atomicity concept provides a general intermediate step in the construction of 
atomic variables from regular ones. It facilitates new constructions, and simplifies the 
correctness proofs. 

Lamport (1986) has shown that there is no construction of atomic variable from regular 
ones in which only the writer writes. Since we are able to construct atomic variables from 
weakly atomic ones without the necessity of the readers writing, it follows that there is no 
construction of weakly atomic variable from regular ones in which only the writer writes. 


Sibsankar Haidar’s comments on the earlier versions of this paper and the reports of two 
anonymous referees helped to improve the presentation considerably. 
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Intelligent Systems 


Foreword 

Intelligent systems are systems with the general ability to cope successfully with a com¬ 
plex and changing environment. Regardless of how intelligent behaviour is achieved, 
intelligent systems are built on some common underlying principles derived from re¬ 
search in artificial intelligence conducted over the past four decades. These include: 
(1) representation and search, (2) management of uncertainty, (3) learning ability, and 
(4) problem solving. The most representative and important areas of research and de¬ 
velopment in intelligent systems are: (1) natural language processing systems, (2) intel¬ 
ligent decision support systems, (3) intelligent tutoring systems, (4) machine learning 
systems, (5) intelligent robotic systems, and (6) expert systems. In the last few years, 
systems based on multiple intelligent agents have become a popular area of study. These 
agents are designed to possess some of the following characteristics such as rational¬ 
ity, autonomy, social ability, mobility, pro-activeness, selectivity and robustness. These 
software robots perform such varying functions as intelligent assistants, to specialist con¬ 
sultants in distributed systems. Naturally, therefore, scientists from several disciplines 
such as computer science, psychology, philosophy, logic, linguistics, and neuro-biology 
have contributed to the growth of this area. As a consequence, neural networks and ge¬ 
netic algorithms have emerged as independent disciplines coming under machine intel¬ 
ligence. Intelligent systems have been developed for a variety of applications including 
medical diagnosis, geological exploration, chemical data interpretation, financial deci¬ 
sion making, equipment fault diagnosis, and computer configuration. Some of the ma¬ 
jor projects in this area are: (1) engineering and operation of spacecraft, (2) intelligent 
multimedia/multimodal (M4) systems, (3) monitoring patients in intensive care units, 
(4) management and accounting information systems, and (5) intelligent information 
retrieval systems. 

This special issue of Sadhana has eight papers dealing with several basic and applied 
issues of intelligent systems. The paper by Dasgupta, Chakrabarti and DeSarkar deals 
with heuristic search strategies for multi-objective state space search. The paper by Sarkar, 
Ghose and Chakrabarti considers learning for efficient search, while the one by Siromoney 
and Siromoney describes a machine learning system for identifying transmembrane do¬ 
mains from amino acid sequences. The paper by Sarma and Deepak Kumar describes 
an intelligent decision support system for project management. Furtado and Sen in their 
article consider synthesis of unlimited speech in Indian languages, while Sengupta and 
Chaudhuri consider morphological processing of Indian languages for lexical interaction. 
The paper by Rajaraman and Garud deals with an application of decision tables to process 
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control. Finally, the role of neural networks in contract bridge bidding is discussed b; 
Yegnanarayana et al in their article. 


June 1996 
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Heuristic search strategies for multiobjective state 
space search 

PALLAB DASGUPTA, P P CHAKRABARTI and S C DESARKAR 

Department of Computer Science and Engineering, Indian Institute of Tech¬ 
nology, Kharagpur 721302, India 
email addresses: [pallab,ppchak,scd]@cse.iitkgp.ernet.in 

Abstract. The multiobjective search model is a framework for solving mul¬ 
ticriteria optimization problems using heuristic search techniques, where the 
different dimensions of a multiobjective search problem are mapped into a 
vector valued cost structure and partial order search is employed to determine 
the set of non-inferior solutions. This new framework for solving multicriteria 
optimization problems has been introduced by Stewart and White, who pre¬ 
sented a generalization of the well known algorithm A* in this model. This 
paper presents several results on multiobjective state space search which helps 
in refining the scheme proposed by them. In particular, the following results 
have been presented. 

• The concept of pathmax has been generalized to the multiobjective frame¬ 
work. It has been established that unlike in the conventional model, 
multidimensional pathmax (in the multiobjective model) is useful for non- 
pathological tree search instances as well. 

• We investigate the utility of an induced total order on the partial order search 
mechanism. The results presented are as follows: 

- If an induced total order is used in the selection process, then in general 
it is not necessary to compute the entire set of heuristic vectors at a node. 

- In memory-bounded search, a multiobjective search strategy that uses 
an induced total order for selection can back up a single cost vector 
while backtracking and yet guarantee admissibility though multiple non¬ 
inferior candidate back-up vectors may be present in the space pruned 
while backtracking. 

• In this paper we study multiobjective state space search using inadmissible 
heuristics. We show that if heuristics are allowed to overestimate, then no 
algorithm is guaranteed to find all non-inferior solutions unless it expands 
dominated nodes also. 

The paper also addresses the task of multiobjective search under memory 
bounds, which is important in order to make the search scheme viable for 
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practical purposes. The paper presents a linear space multiobjective search 
strategy called MOR*0 and suggests several variants of the strategy which may 
be useful under different situations. 

Keywords. Heuristic search; multicriteria optimization. 


1. Introduction 

Many real world optimization problems have multiple, conflicting and non-commensurate 
objectives. The task of adequately modeling such problems in a search framework that is 
designed for optimizing single scalar functions is by no means easy, and has been the subject 
of considerable debate in the past (Keeney & Raiffa 1976). One popular approach of solving 
such problems is to cast them into the conventional search framework after combining the 
multiple criteria into a single scalar criterion. However, in most multiobjective problems, 
the semantics of the desired solution is context dependent and can be dictated by individual 
preferences. Therefore, the task of constructing the combined evaluation function so as 
to preserve the semantics of the desired solution is difficult, and may require sufficient 
experience about solving that problem. 

The other popular approach of solving multiobjective problems is to optimize one crite¬ 
rion at a time under given constraints on the others. This approach automatically preserves 
the semantics of the problem since it allows the multiple dimensions to retain their indi¬ 
vidual identities. However, one difficulty lies in determining a set of good constraints, in 
the absence of which search becomes unduly expensive. Moreover, repeatedly searching 
the same state space by progressively refining the constraints (until a satisfactory solution 
is found) increases the search complexity enormously. 

The multiobjective search model was introduced by Stewart and White (Stewart 1988; 
Stewart & White 1991) as a unified framework for solving search problems involving mul¬ 
tiple objectives. Since multiple non-commensurate criteria are involved, the solution space 
is partially ordered and will, in general, contain several non-inferior solutions. Multiobjec¬ 
tive search addresses the task of determining the set of such solutions in the search space. 
Once the set of non-inferior solutions are found, standard procedures may be applied to 
choose the desired solution (Bogetoft 1986; Kok 1986; Korhonen 1986; Korhonen et al 
1986; Mond & Rosinger 1985; Vansnick 1986). 

In the multiobjective framework, the costs are modelled by vectors, such that each 
dimension of the cost vector represents a distinct non-commensurate optimization criterion. 
The following partial order is used to identify the non-inferior options. 

DEFINITION 1 

Dominance: Let yi and yo be two K-dimensional vectors. Then yi dominates y 2 iff: 

1. yi < y 2 in all the K dimensions, and 

2. yi < y2 in at least one of the K dimensions. 

A vector y r ; is said to be “non-dominated” in a set of vectors Y if there does not exist 
another vector y j e Y such that y j dominates y,-. □ 
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A multiobjective search strategy uses the above partial order to eliminate all clearly 
inferior alternatives and direct the search towards the set of non-dominated solutions. The 
multiobjective heuristic search problem is as follows: 

DEFINITION 2 
Multiobjective search problem: 

Given: 

1 . A search space, represented as a locally finite directed graph. 

2. A vector valued cost structure, with each dimension representing a distinct 

optimization criterion. 

3. A heuristic evaluation function that returns a set of non-dominated vector 

valued costs fob each candidate search avenue. Each vector is an estimate 
of the potential non-dominated solutions which may be obtained along 
that search avenue. 

Find: 

The set of non-dominated solutions in the search space. n 

It should be noted that in the multiobjective search space a node may have multiple 
heuristic vectors. This is due to the fact that a node may lie on the path to multiple mu¬ 
tually non-dominated solutions. For every solution path that contains the given node, the 
multiobjective heuristic function may compute a vector valued estimate of the correspond¬ 
ing solution cost. It should be noted that some of these vectors computed at the node may 
not be distinct since the same vector may estimate the cost vectors of more than one solu¬ 
tion. However if a single vector were to estimate the cost vectors of all the solutions then 
the heuristic value would become overtly restrictive. Therefore, a typical multiobjective 
heuristic function computes a set of non-dominated heuristic vectors at every node. 

In their work, Stewart and White (Stewart 1988; Stewart & White 1991) presented an 
algorithm MOA* which is a generalization of the well known A* algorithm (Nilsson 1980) 
to the multiobjective search framework. Like A*, the algorithm maintains a list (called 
OPEN) of tip nodes of the different search paths. A node in OPEN is considered to be 
a candidate for expansion if one or more cost vectors of the node is non-dominated by 
other nodes in OPEN and by the solution vectors found so far. The algorithm terminates 
when there are no more non-dominated nodes in OPEN. Stewart and White have shown 
that when the heuristic function is admissible, the algorithm MOA* terminates with the 
entire set of non-dominated solutions. Several well known results from the conventional 
search model have been extended to the multiobjective framework. For example, it has 
been shown that if A and A' are two versions of MOA* such that A is at least as informed 
as A' then A expands no more nodes than A'. They have also established that monotonicity 
of the multiobjective heuristic function implies admissibility. 

This paper addresses two broad topics. The first is to study multiobjective state space 
search under two different types of heuristics, namely non-monotone and inadmissible 
heuristics. The other is to develop search strategies for the multiobjective framework that 
operate under space constraints. Most of the analyses in this paper assumes that the state 
space is an implicitly specified tree. 
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In this paper we generalize the concept of pathmax to the multiobjective framework 
and show that in this model the utility of pathmax extends over a wider class of problem 
instances than in the conventional search model. For the problem of searching with inad¬ 
missible heuristics we show that in general only brute force search strategies are guaranteed 
to find all non-dominated solutions. We also analyze specific conditions under which a 
best-first search strategy such as MOA* is admissible. 

One of the major contributions of the present work has been to investigate the utility 
of using an induced total order to guide the selection process. In this paper we establish 
several results which show that the policy of using an induced total order called K-order 
is useful in many situations. The concepts of pathmax and K-ordering lead to an improved 
version of the algorithm MOA*. The new algorithm is called MOA**. 

The problem of multiobjective heuristic search in restricted memory has been consid¬ 
ered in this paper. We show that the main difficulty in extending known memory bounded 
search techniques from the conventional model arises from the existence of multiple can¬ 
didate back-up cost vectors in the pruned space. We then establish that if an induced total 
order such as K-order is used to guide the partial order search then it is possible to con¬ 
struct a strategy that backs up a single cost vector while backtracking and yet guarantees 
admissibility. A linear space search strategy called MOR*0 is presented on these lines. 
Several variants of the strategy are suggested to cater to specific situations. 

The paper is organized as follows. Section 3 presents the generalization of pathmax to 
the multiobjective search scheme and establishes the related results. The policy of using an 
induced total order to guide partial order heuristic search is considered in § 4. In § 5 we con¬ 
sider the utility of incorporating K-ordering in the algorithm MOA*. Section 6 addresses the 
problem of multiobjective state space search in bounded memory. The algorithm MOR*Q 
is presented in the same section. In § 7 we analyze the problem of multiobjective search 
using inadmissible heuristics and present related results. 


2. Preliminary notations and definitions 

Most of the notations used in this paper are standard notations adopted in heuristic search 
literature. We therefore highlight only the characteristic terminologies of multiobjective 
heuristic search. 

+ : If a and b are vectors (of equal dimension) then a + b denotes the 

vector formed by summing a and b in each individual dimension. 

— : If a and b are vectors (of equal dimension) then a — b denotes the 

vector formed by subtracting b from a in each individual dimension. 
vmax(a, b): If a and b are vectors (of equal dimension) then vmax{ a, b) 

denotes a vector which is equal to the maximum of a and b in each 
individual dimension. 

H(n) : The non-dominated set of heuristic vectors computed at node n. 

G(n) : G(n) is the vector valued cost of the path from the start node s 
to node n in the search tree. 

F(n) : The non-dominated set of cost vectors of node n computed as h(n) + g(n), 
where h(n) e H(n) and g(n) e G(n). In the case of trees g(n) is 
the same as G(n). 
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The definition of admissible and non-monotonic heuristics are as follows. 
DEFINITION 3 

Admissible heuristics: A multiobjective heuristic function is said to be admissible if the 
set H(n) of heuristic vectors computed at each node n satisfies the following property: 

Admissibility property: For every solution path P(s,y) through node n (where s is the 
start node and y is any goal node), there exists a heuristic vector h(n) in H(n) such 
that h(n) either dominates the cost vector of the solution path from n to y or is equal 
to it. 

It should be noted that in one extreme case H(n) may contain only a single vector which 
satisfies the above property, while in the other extreme case H{n) may contain a distinct 
vector corresponding to each solution path through node n. □ 

DEFINITION 4 

Monotone heuristics: A multiobjective heuristic function is said be monotone (or consis¬ 
tent) if for all nodes «,• and nj, such that nj is a successor of , the set H{n{) of heuristic 
vectors computed at node satisfies the following property: 

Monotonicity property: For each heuristic vector h(tij) in H(nj), there exists a heuristic 
h(ni) in //(«,), such that h{n{) either dominates the vector c(n,-, nj)+h(nj ) or is equal 
to it. c(«i, nj) denotes the cost vector of the edge from n,- to nj. □ 

3. Multidimensional pathmax 

Pathmax is a standard concept in heuristic search which is used to strengthen admissible 
non-monotonic heuristics during search. The basic idea is to use the cumulative heuristic 
information available along the search path to compute the heuristic cost of a node, so that 
the heuristic cost of the node is consistent with the heuristic cost of its ancestors. 

The idea of using pathmax to propagate the cost of the parent node to the child node 
was suggested by Mero (1984). Pathmax has been found to be useful on the following 
accounts. 

• Dechter & Pearl (1985) and Gelparin (1977) have shown that by using pathmax it is 
possible to develop A*-like algorithms that are superior to A* in terms of the set of nodes 
expanded. However they have also shown that such extensions of A* may expand a 
smaller set of nodes only inpathological problem instances, that is, in problem instances 
where every solution path contains at least one fully informed non-goal node. It has 
been shown (Dechter & Pearl 1985) that in non-pathological cases the same set of 
nodes will be expanded irrespective of using pathmax. 

• Martelli (1977), andBagchi &Mahanti (1983) have shown that it may be possible to the 
reduce the number of nodes re-expanded by A* in graph search problems considerably 
if a strategy similar to pathmax is used. Chakrabarti et al (1989) have also used this 
result in memory bounded search. 

Therefore in the conventional search model, the utility of pathmax in tree search is limited 
to pathological problem instances only. In the case of graphs pathmax may be used to 
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reduce the number of node re-expansions in non-pathological cases as well. However the 
set of nodes expanded remain the same irrespective of using pathmax. 

In this section, we generalize the concept of pathmax to the multiobjective search model. 
We shall show that in the multiobjective search framework pathmax may reduce the set of 
nodes expanded even in non-pathological cases. 

3.1 The definition of pathmax 

When the successor m of node n is generated, each heuristic vector h(m) evaluated by the 
heuristic function at node m is updated using pathmax as follows: 

For each vector h(n) in H(n) 

For each vector h(m) evaluated at node m 

1. Create a new vector h'{m) as follows: 

h'(m) •«- vmax{h(m), h(n ) — c(n , m)} 
where c(n , m) denotes the cost vector of the edge (n, m), and 
the function vmax is as defined in § 2. 

2. Put h'{m) in H(m) and remove dominated vectors from H(m), if any. 

Thus pathmax is used in the same way as in the single objective search model, but for 
every heuristic vector and along each dimension of the vector. 

3.2 Two basic properties of pathmax 

In order to establish the validity of multidimensional pathmax in the multiobjective frame¬ 
work we prove the following two basic properties. The proofs consider trees only. 

Theorem 1. The set of heuristics remain admissible if pathmax is used. 

Proof. We first prove that if the set of heuristics at a node n is admissible then the set 
of admissible heuristics at each immediate successor node m remains admissible when 
pathmax is used. 

If the set of heuristics at node n is admissible, then corresponding to every solution path 
to a goal node y through node n, there exists h{n) in H(n) such that h(n) either dominates 
h*(n, y) or is equal to it, where 7i*(n, y) denotes the cost vector of the path from n to y. If 
the same solution path contains node m then h(n) - c(n, m) either dominates or is equal 
to h*(m, y). Now since h(m) is admissible, it follows that h{m) also dominates h*(m, y) 
or is equal to it. Therefore, the heuristic vector h'{m) computed using pathmax as follows 
will also dominate h*(m, y): 

h\m) <— vmax{h(m), h(n) — c(n, m)}. 

Thus H(m) either contains h'(m) or some vector which dominates h'(m). It follows that 
the set of heuristics H(m) computed using pathmax is admissible. 

We have shown that the set of heuristics at a node remain admissible provided the set 
of heuristics at its parent node is admissible. Since the set of heuristics at the start node 
remain the same, the result follows by induction. □ 
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Theorem 2 . Every node expanded by any admissible algorithm A using pathmax is also 
expanded by A without pathmax in the worst case. 

Proof. Each heuristic vector computed at a node without pathmax either equals or domi¬ 
nates some heuristic vector computed using pathmax. Therefore if a node and its ancestors 
have one or more cost vectors that are non-dominated when computed using pathmax, then 
the node and its ancestors are non-dominated if pathmax is not used. Since any admissible 
algorithm A will have to explore all non-dominated cost paths in the worst case, the result 
follows. It may be noted however that for certain individual instances anomalies may occur 
due to tie resolutions where the worst case set of nodes is not expanded. □ 

3.3 The significance of pathmax 

We now establish the result that in the multiobjective search framework pathmax may be 
used to reduce the set of nodes expanded in non-pathological problem instances as well. 

Theorem 3 . There exists non-pathological problem instances (that is, problem instances 
in which not all solution paths contain a fully informed non-goal node) where an arbitrarily 
large number of extra nodes will be expanded unless pathmax is used. 

Proof. Consider the tree in figure 1. The cost vector of each edge is shown beside the 
edge. The heuristic vectors computed without pathmax are shown beside the nodes. For 
simplicity, we consider only 2-dimensional costs and only one heuristic vector per node. 
It should be noted that no non-goal node along the solution path to node 774 is totally 
informed (in either dimension). 

When n\ is expanded, the heuristic vector (1,13) is computed at node 773 and c(n \ , n f) 
is found to be (1,1). From this F(nj) is computed as (2,14). Likewise Finf) is computed 
as (2,10). Since (2,10) dominates (2,14), 772 will be expanded earlier than 773 to generate 
774. Then F{nf> is computed as (3,12). Now either 714 or 713 may be selected. Irrespective 
of which is selected earlier, the expansion of 713 is certain. If H{nf) is computed without 
pathmax, then F(nf) — (4,9), and the expansion of 775 is also certain since (4,9) is 
non-dominated by the cost (3,12) of node 774. On the other hand, using pathmax H(ns ) 
is computed as (2,12). Therefore F(n^) becomes (4,14) which is dominated by the cost 
(3,12) of 774. Thus 774 will be selected earlier and the solution of cost (3,12) will be obtained. 
It follows that 775 will never be expanded. 

It is easy to see that the expansion of node 715 may lead to the expansion of an arbitrarily 
large number of nodes in the subtree rooted at node 775, each of which contains some 
heuristic vector non-dominated by (3,12) without pathmax. The result follows. □ 


4 . An induced total ordering: K-ordering 

Efficiency of problem solving in the multiobjective search framework is largely dependent 
on effective partial order search strategies for finding the set of non-dominated solutions. A 
characteristic feature of partial order search is the presence of multiple non-inferior search 
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Figure 1. Multiobjective tree 
trating the advantage of pathmax. 


paths each of which must be explored in order to guarantee admissibility. The set of i 
in OPEN that have one or more non-dominated cost vectors represent these non-in 
search paths and therefore must be expanded by admissible search strategies like M 
We formally define non-dominated nodes in OPEN as follows: 

DEFINITION 5 

Non-dominated node: A node n in OPEN is said to be non-dominated if at least one 
cost vectors in F(n) is non-dominated by all cost vectors of every other node in ( 
and the cost vector of all solution nodes found so far. 


It may be easily shown that the order in which the non-dominated nodes in OPE 
selected does not affect the total number of nodes expanded by a strategy like MC 
the worst case. We shall present a proof of this result later in this paper. In this secti 
introduce the policy of using an induced total order to select a non-dominated nod< 
OPEN. While this policy does not affect the number of nodes expanded by strategic 
as MOA*, we shall show that there are several other benefits of using the policy. 

A basic advantage of using an induced total ordering is that we then have a mech 
of identifying a non-dominated node from OPEN without having to identify the 
set of non-dominated nodes in OPEN. This is easily possible if the induced total o 
defined in the following manner. 

DEFINITION 6 

K-ordering: Let y 1 and y 2 be two K-dimensional vectors. Then, based on K-ori 
y 1 > y 2 if 

3 j, 1 < j < K such that yj > y 2 and V/, i < j, yj = yf 

It essentially represents the policy of comparing two vector-valued costs on the b 
the first component; breaking ties on the basis of the second component; breaking 
ties on the basis of the third component, and so on. The other relational operators (s 
= and <) are defined in a likewise manner. 


The basic idea is derived from the single objective search strategies where tl 
structure OPEN is ordered on the basis of the costs of the nodes. It is easy to see < 
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minimum cost vector in K-order is always non-dominated in a set of cost vectors. Since a 
node may have several cost vectors (in F(n)), we define the representative cost vector of 
a node based on the induced total order as follows. 

DEFINITION 7 

The representative cost vector of a node: The representative cost vector of node n is 
the minimum cost vector using K-ordering on the set of cost vectors F(n) which is non- 
dominated by the cost vector of every solution found earlier. 

If the representative cost vector of a node becomes dominated by the cost vector of 
some solution found during the course of the search, then the next non-dominated cost 
(in K-order) becomes the current representative cost of the node. The representative cost 
vector of a path in the state space is the same as the representative cost vector of its tip node. 

In this paper, we shall show that using an induced total ordering such as K-ordering to 
guide the partial order search may be useful in several situations. The main results are as 
follows: 

• In § 5, we show that if K-ordering is used to guide the search, then in some cases it may 
not be necessary to evaluate all the heuristic vectors at a node. This result is particularly 
useful in problems where generating the heuristics are costly. 

• In § 5, we also observe that in two-objective problems (which are quite numerous) 
we can simplify the process of dominance checking by using K-ordering to guide the 
search. 

• In § 6, we discuss the advantage of using K-ordering in memory bounded search. In 
particular, we show that though the space pruned by a memory bounded search strategy 
(while backtracking) may contain a large number of nodes having one or more non- 
dominated cost vectors, it is possible to back up a single cost vector from the pruned 
space and yet guarantee admissibility by using K-ordering to guide the search. 

5 . The algorithm MOA** 

By incorporating the ideas of K-ordering and pathmax, it is possible to design an improved 
version of the algorithm MOA* of Stewart & White (1991). The new algorithm follows 
directly from MOA* with the exceptions that the heuristic vectors are computed using 
pathmax and the selection of nodes from the list OPEN is based on the K-ordering. We shall 
refer to this algorithm as MOA**. The algorithm outline and the proof of its admissibility 
is presented in appendix A. 

Since the new algorithm MOA** utilizes the path information through pathmax, it follows 
from the properties of pathmax that the new algorithm will be superior to MOA* in terms 
of the worst case number of nodes expanded. In this section we discuss the advantages of 
using K-ordering in the new algorithm. 

We first show that the set of nodes expanded by MOA** does not depend on the way in 
which the induced total ordering is defined. In other words we show that instead of using 
K-ordering if the non-dominated nodes are selected from OPEN in any other order, the 
total number of nodes expanded by MOA** in the worst case remains the same. 
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Theorem 4. A node n is expanded by MOA** only if n and all its ancestors have one or 
more cost vectors that are either equal to or non-dominated by the cost vector of every 
solution node in the search space. If the node n and its ancestors also have at least one 
cost vector that is non-dominated by the cost vectors of every solution path not containing 
n, then the node n will definitely be expanded by MOA**. 

Proof. We prove the sufficiency condition first. If a node n and all its ancestors have one or 
more non-dominated cost vectors then step 2 of MOA** shows that the node is a candidate 
for expansion. However, if every non-dominated cost vector of n (or of one of its ancestors) 
equals the cost vector of other solution paths, then it is possible that those solution paths 
are found earlier and n is never expanded. Otherwise, it is easy to see that the node n will 
be expanded by MOA**. 

Now let us consider a node n selected for expansion in [step 3] of MOA**. It is expanded 
if its representative cost vector is non-dominated by every solution node found so far. Let 
us assume that this cost vector is dominated by the cost vector of some non-dominated goal 
node y in the search space which has not been found so far. This leads us to a contradiction 
because until y is found, OPEN will contain either y or one of its ancestors, each of which 
has one or more cost vectors that dominate the cost vector of y (and therefore dominate 
the representative cost vector of n). The result follows by induction on the ancestors 
of n. □ 

Based on theorem 4, we define the nodes that are surely expanded by MOA** as follows. 
DEFINITION 8 

Surely expanded nodes: If a node n and each of its ancestors have one or more cost vectors 
that are non-dominated by every solution path which does not contain n, then n is surely 
expanded by MOA**. 

Since theorem 4 establishes the fact that the set of surely expanded nodes will have to be 
expanded by any admissible strategy irrespective of the order of selection of non-dominated 
nodes from OPEN, it follows that in the worst case, the set of nodes surely expanded by 
MOA* and MOA** are the same. Thus the optimality results proved by Stewart & White 
(1991) for MOA* extends to MOA** as well. 

We now establish a property of MOA** which will be useful in describing the advantage 
of using K-ordering in partial order search. 

Theorem 5. The non-dominated solution nodes are found by MOA** in strictly increasing 
K-ordered sequence of their cost vectors. 

Proof. Let yi and yj be two non-dominated solutions in the search space such that the 
cost vector of yi is less than the cost vector of y 2 in K-order but yi is found by MOA** 
before yi. This means that when y 2 was selected from OPEN, there were no node in OPEN 
having one or more cost vectors that dominate the cost of y\ because then that node would 
have had a smaller representative cost vector. This leads us to a contradiction because 
OPEN must contain either y\ or one of its ancestors which will contain a cost vector that 
dominates the cost vector of y\. □ 
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Based on the above results, the following advantages of using K-ordering becomes 
apparent. 

A. Dominance checking in two-objective cases: Since the solution nodes are found in a 
K-ordered sequence of their cost vectors, the list of solution vectors found so far is 
automatically sorted in the first component. In two-objective problems (which are 
found to occur frequently) the task of testing a cost vector for dominance by solutions 
can be done in constant time simply by comparing the second dimension of the vector 
with the second dimension of the latest solution cost vector in the list of solution vectors. 

B. Computing all heuristic vectors is not necessary: If the heuristic function can gener¬ 
ate the heuristic cost vectors of a node in K-order then we may adopt the policy of 
computing only the representative cost vector of the node at the time of its generation 
rather than computing the entire set. Since the node may be expanded by virtue of 
this representative cost vector we save the computational effort of generating the other 
heuristic vectors of the node. 

The above advantages of using K-ordering can easily be generalized to any induced total 
order on partial order heuristic search. In the next section we shall illustrate the advantage 
of using K-ordering in memory bounded partial order search and show that in the general 
situation the use of K-ordering is a key to perform the search efficiently within given 
memory constraints. 


6. Memory bounded multiobjective search 

The exorbitant space requirements of search strategies such as A* have motivated re¬ 
searchers to develop schemes for searching within memory constraints. In the past decade 
several memory bounded heuristic search strategies have been developed for the general 
search model. The linear space iterative deepening algorithm IDA* of Korf (1985) is one 
of the foremost among such strategies. The algorithm MA* of Chakrabarti et al (1989) 
uses the given memory to retain the best portion of the state space and thereby reduce the 
number of node re-expansions. The algorithm IDA* -CR of Sarkar et al (1991) and the 
algorithm DFS* of Rao et al (1991) use suitable techniques to determine the cost cut-off 
of the next iteration in a way so as to reduce the number of node re-expansions. Other im¬ 
portant contributions in the area of restricted memory search include the strategies MREC 
of Sen & Bagchi (1989), IE and SMA* of Russel (1992), RBFS of Korf (1992,1993) and 
IDA*_CRM of Sarkar et al (1992). 

In this section we show that the main difficulty in extending standard memory bounded 
search techniques from the general search model to the multiobjective search model lies 
in the cost back-up mechanism. We further show that the policy of using K-ordering can 
resolve this problem. This fact is established by incorporating K-ordering in a multiobjec¬ 
tive search algorithm and showing that the algorithm is admissible and operates in linear 
space. 
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6.1 Cost back-up and K-ordering 

A standard feature of all asymptotically optimal memory bounded search strategies is the 
act of backing up of costs while backtracking. When such algorithms backtrack from the 
current path to select a better path, they typically back up the cost of the current path before 
pruning it, so that the same path is selected only if all other paths of lower cost have been 
visited. Thus before removing a portion of the explicit search space from the memory the 
algorithm ensures that the best cost from the pruned space is backed up. 

Whereas in the conventional search model it is possible to identify a single undisputed 
best cost from the pruned space by using the total order on the cost structure, the same 
is not possible in the multiobjective model due to the partial order on the cost vectors 
which allows the existence of multiple non-dominated cost vectors in the pruned space. 
Since the number of such vectors may be quite large, in general it will not be possible to 
back-up the entire set of non-dominated vectors due to space limitations. Therefore it is 
necessary to adopt some technique so that not many cost vectors need to be backed up and 
yet admissibility is guaranteed. We show that the policy of using an induced total ordering 
(such as K-ordering) to decide the direction of the search is one such technique to resolve 
the problem. 

In the following sub-sections we shall show that if the policy of searching in a K-ordered 
best-first manner is adopted then it is possible to back-up only a single cost vector from the 
pruned space while backtracking. In order to implement this policy over the entire search 
mechanism (which includes the task of updating the cost vectors of regenerated nodes 
from the backed up cost vectors) we define a function called Minf which assigns updated 
cost vectors to all nodes in the explicit search space and ensures that the search proceeds 
in a K-ordered best-first manner. 

Once the policy of using K-ordering is adopted, it becomes possible to extend standard 
memory bounded search techniques to the multiobjective search framework. We present 
a linear space algorithm called MOR*0 which incorporates the idea of K-ordering with 
standard memory bounded search techniques. 

In this context it must be noted that the admissibility of MOR*0 only establishes the 
result that it is possible to guarantee admissibility by backing up a single cost vector. While 
this result guarantees the feasibility of MOR*0 under all situations it does not necessarily 
imply that the policy of backing up more than one cost vector from the pruned space is 
inferior to the policy of backing up a single cost vector. In fact, when sufficient memory 
is available, the policy of backing up more than one cost vector can typically reduce the 
number of node re-expansions during search. Thus there is a scope for trade-off between 
the number of costs to be backed up and the advantage gained due to reduction in the 
number of node re-expansions. These issues are considered in § 6.4 where a variant of the 
proposed algorithm is suggested that backs up more than one cost vector from the pruned 
space while backtracking. 

6.2 General philosophy ofMOR*0 

The algorithm MOR*0 is a generalization of restricted memory heuristic search techniques 
(in the conventional search model) with suitable features incorporated to adapt it to the 
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Figure 2. The structure of a node. 


multiobjective framework. In this section we describe the general structure of the nodes 
and the basic features of the algorithm. In the following discussion we loosely use the term 
“minimum” to refer to “minimum in K-order”. Also when we say that a cost vector is 
greater or less than another vector we mean that the comparison is on the basis of K-order. 

The structure of a node is shown in figure 2. H(n) denotes the set of heuristic vectors 
of node n using pathmax. Each node n maintains a list {Fi (w), F 2 (n),F m(k)} of cost 
vectors, where M denotes the number of successors of n and F, (n) denotes the minimum of 
the estimated cost vectors of solution paths through node n and its i th child. F(n) denotes 
the minimum of {F, (n)} and represents the minimum estimated cost vector of solution 
paths through n. F(n) is called the representative cost vector of node n. 

The basic features of the algorithm MOR*0 are as follows. 

Use ofGL: Algorithm MOR*0 uses a vector called GL (greatest lower bound), which 
holds the cost vector of the current best path. When a new path is selected, GL denotes 
the estimated promise of the path. 

Use of NEXT-MIN: When MOR*0 extends the most promising path, it maintains the next 
best promise in a vector called NEXTJVIIN. The selected path is extended until its 
cost vector exceeds the cost vector stored as NEXT_MIN. When this happens, MOR*0 
backtracks up to the node where the cost vector in NEXT-MIN is backed up. 

Cost back-up: When MOR*0 backtracks it backs up only the minimum (in K-order) of the 
cost vectors in the current path. 

Backtracking and cost revision: If the cost vector of the current search path exceeds GL, 
and there exists a better alternative path having one or more cost vectors that are less 
than the cost vector of the current path, then MOR*0 backtracks as follows. 

Let p be the tip node of the current path. Let q be the parent of p and p be its 
y'th child. The value of F(p) is backed up in Fj(q) and F(q) is revised to the 
minimum among all vectors F, (<?) backed up at q. If this revision alters the 
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vector F(q), then F(q) is backed up to its parent, and so on until either for some 
node n in the path F(n) is unaltered, or the root node is reached. 

We refer to this process of backing up of cost vectors as cost revision. 

Use of Minf: When a node n is expanded to generate its children m,-, the set of cost vectors 
of mi is computed by adding G(m,) with the heuristic vectors in H(mi). However, in 
the cases where m; is being re-generated F(mi) should be assigned a cost vector that is 
consistent with the cost vector that was backed up previously as F; (n) . This assignment 
is done by a function Minf which takes two aspects into account. 

1. F (mt) must be assigned a value greater than or equal to F\ (n). 

2. The value assigned to F(mi) must be consistent with the cost vectors computed at 
node mi, that is, it must be dominated by at least one cost vector of node mi. 

The function Minf determines the minimum vector which is greater than or equal to 
Fi (n) and which is dominated by at least one of the cost vectors of mi and assigns this 
vector to F (m{). 

Test for dominance: If the vector F(m) of node m is dominated by the cost vector of some 
solution found previously, then the dominated heuristic vectors in H(m) are removed. 
If H(m) becomes empty, then the F(m) is set to infinity, and the algorithm backtracks. 
Otherwise, F(m) is assigned a new vector by applying Minf on Fi (n) and the new H(m), 
where n is the parent of m, and m is the ith successor of n. 

Since the search proceeds in K-order, it is easy to see that the first dimension of the 
representative cost vector of a node is greater than or equal to the first dimension of 
the cost vector of every solution found so far. Therefore, while testing for dominance 
against the cost vector of a solution, we ignore the first dimension. 

Termination condition: When a non-dominated solution node is found, its path to the 
source is traced and the cost vector of the path is saved. Then the cost vector of the 
node is set to infinity, so that the algorithm backtracks and proceeds along alternative 
paths for other solutions. When all the cost vectors of a path becomes dominated by the 
cost vectors of solutions, infinity is backed up along that path. The algorithm terminates, 
when infinity is backed up along all paths. 

6.3 Algorithm MOR* 0 

The outline of the algorithm in recursive form is given below. The min operator refers to 
the minimum in K-order. 

MOR*0(node:n,cost:GL,cost:NEXT_MIN) 

1. If F(n) > GL return F(n) 

2. Test for dominated heuristics 

2.1 Remove all dominated heuristics in H(n) 

2.2 If H(n) is empty, return oo 

2.3 Recalculate F(n) <- Minf(GL , G(n), H(n)) 

2.4 If F(n) increases then return F(n) 

3. IF n is a goal node THEN 
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3.1 Put n in SOLUTION.GOALS and its cost in SOLUTION_COSTS 

3.2 Output the solution path and return oo 

4. Expand n, generating all its successors. 

4.1 For each successor mi of n 

4.1.1 Evaluate the set H(mi) of non-dominated heuristics 

4.1.3 Calculate G(ra z ) 4- G(n) + c(n, mi) 
where c(n, m z ) is the cost of the arc (n , m z ). 

4.1.4 Use the function Minf to evaluate the value of F(m z ) 

4.1.5 Set Fi(mi) F(m/) Vi 

4.1.6 Set Fi(n) F{m t ) 

4.2 Set F' <- min{Fi(n ), Vz}. 

5. Letmybethe successor with cost F\ Set F" min{Fi(n), Vi, i ^ j}. 

5.1 Fj(n) MOR*Q(mj , GL, m in(NEXTMIN,F n )) 

5.2 Set F' m/n{F z (Ai), V/}. 

5.3 Set MIN 4- min (NEXTMIN,F') 

5.4 If F' = M/A THEN 

5.4.1 Set GL <- NEXT JAIN 

5.4.2 Go to [Step 5] 

Else return F' 

6.3a The function: Minf Whenever the zth successor m of node n is generated, F(m) 
is assigned the minimum vector which is greater than or equal to Fi(n) and which is 
dominated by at least one of the cost vectors of m. The outline of the function is given 
below: 

Minf(Fi(n),G(m),H(m)) 

Let fj(n) denote the y’th dimension of F z (n). 

1. Construct a set F(m) in the following way: 

1.1 For each heuristic h(m) e H(m ) create a new vector f(m): 

1.1.1 f{m) G(m) + h(m) 

Let fj(m) denote the jth dimension of f{m). 

1.1.2 For j = 1 toy = K 

If fj (n) > fjim) Then Set fj(m) <- fj(n) 

Else Go to [Step 1.1.3] 

1.1.3 Put f(m) in F(m) 

2. Return the minimum vector (in K-order) from F(w). 

End. 

The following example illustrates the working of the algorithm M0F*O and justifies the 
use of the function Minf To demonstrate the basic idea we consider a simple case where 
each node has a single cost vector. 

Example 1. We show the operation of MOR*0 on the tree of figure 3. Since H (s) is (3,2,2), 
the algorithm starts with GL as (3,2,2). Node s is expanded to generate node n\ of cost 
(4,3,4) and node n 2 of cost (10,10,10). 

Then node n\ is selected. The value of GL now becomes (4,3,4), and NEXTJMIN is 
assigned the value (10,10,10). Node n\ is expanded to generate node ns of cost (6,7,7) and 
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Figure 3. Tree illustrating the utility of Minf. 

node n% of cost (7,8,8). The cost of the path has exceeded GL, but it is still less than any 
other alternative since NEXTJV1IN is (10,10,10). 

Therefore nj is selected. The value of GL becomes (6,7,7), and NEXTJVIIN becomes 
(7,8,8). Node n 3 is expanded to generate node «4 of cost (7,8,10) and node n$ of cost 
(6,9, iO). 

Node ft 5 is selected. The value of GL becomes (6,9,10), and NEXT_MIN remains (7,8,8) 
since the alternative cost F(n 4 ) is worse than (7,8,8). Now node n$ is expanded to generate 
nodes n 6 and n-j both of cost (14,14,14). Now the cost of the path has not only exceeded GL, 
but also the value of NEXT_MIN. The algorithm therefore backtracks to the node which 
holds the alternative promised by NEXTJVIIN, i.e. the algorithm backtracks to node n\ 
since F^inx) is equal to NEXTJvlIN. The best cost from the current path, that is, (7,8,10) 
is backed up at F\ (n\). 

Now ng is selected. The value of GL becomes (7,8,8), and NEXTJVIIN becomes (7,8,10) 
which incidentally is the backed up cost from n-$. Node rig is now expanded to generate 
nodes ng and ft 10 both of cost (13,13,13). The algorithm backtracks again to node n\, 
and selects ra 3 on the basis of the backed up cost (7,8,10) which is the current value of 
NEXT-MIN. The cost (13,13,13) is backed up from node n%. The value of GL becomes 
(7,8,10), and NEXTJVIIN becomes (13,13,13). 

In the next step, «3 is re-expanded. If the costs are evaluated without using Minf, then 
F(nf) is evaluated as (7,8,10) and F{nf) is evaluated as (6,9,10) (the same values as those 
evaluated when they were generated for the first time). In that case node «5 will be selected 
and re-expanded. 

However, using Minf, F(nf) is evaluated as (7,8,10) and F(ns) is evaluated as (7,9,10). 
Therefore, «4 is selected and expanded. Subsequently F(n n ) is evaluated as ( 7 , 8 , 10 ), and 
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the solution is obtained. It is easy to see that this cost dominates all other costs in the 
explicit search space, and therefore the algorithm terminates without expanding any other 
node. The unnecessary re-expansion of node ns is therefore avoided by virtue of using the 
function Minf. □ 

6.3b Admissibility of MOR* 0; In this section, we prove the admissibility of MOR*0 
and establish the equivalence of MOR*0 and MOA** in terms of the set of nodes expanded 
by them. In the following analysis when the terms greater, less, minimum, maximum etc. 
are used with respect to vectors, we mean that the comparison is on the basis of K-order. 

Lemma 1. Whenever a node containing the backed-up cost vector of a previously gen¬ 
erated (and consequently pruned) path is selected, MOR *0 will expand at least one new 
node. 

Proof. MQR*0 continues the extension of a path P until the representative cost vector of 
the extended path exceeds the cost vector of the next best alternative. When this happens, 
MOR*0 backtracks up to the node which has this next best promise (now the best). Let 
this node be t and let u be the child of t in the extended path P'. Since MOR*0 backs up 
die F-value while backtracking, F; (t) = F{it), u being the ith successor of t. The backed 
up cost vector F; (t) must have exceeded the previous backed-up vector (otherwise the 
algorithm will not backtrack). Clearly this backed up cost vector is the representative cost 
vector of the tip node n of some path generated by extending the path P. When the path 
P is selected again the node n is guaranteed to be expanded. □ 

Lemma 1 shows that the same path will not be repeatedly re-expanded without any progress. 
Using this is is now easy to establish the termination of MOR* 0. In order to prove that 
MOR*0 is admissible we first show that it expands the same set of nodes as MOA**. 

Lemma 2. MOR *0 always expands the node n with the minimum F-value F{n). Ties are 
resolved in favor of nodes at a greater depth. 

Proof. The proof follows from the way NEXTJVIIN is maintained and the policy of back¬ 
tracking to the deepest node having a backed-up cost vector equal to NEXT_MIN. □ 

Lemma 3. Except for differences due to tie resolutions between nodes having represen¬ 
tative cost vectors equal to non-dominated. solution cost vectors, the representative cost 
vectors of new nodes expanded by MOR* 0 occur in the same sequence as the representative 
cost vectors of nodes expanded by MOA**. 

Proof. MOA** always selects the node having the minimum representative cost vector 
in the explicit search space. We show that each new node expanded by MOR*0 is the 
one having the minimum representative cost vector among all nodes generated up to that 
point. 

When MOR*0 generates a node n for the first time, F(n) is the same as the representative 
cost vector from its set of cost vectors. Since F; (n) denotes the representative cost vector 
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of the tip node of some path through n and its ith successor and F(n) is the minimum 
among all such F, («), it follows that F(n) is actually the minimum among the cost vectors 
of all leaf nodes in the subtree below node n. 

MOR*0 always selects the node n with the minimum F(n) and extends the path until 
its cost increases. Clearly F(n) is the minimum representative cost vector among all 
nodes generated so far. During the course of regeneration there can be no node having 
a representative cost vector less than F(n) (this is guaranteed by the use of pathmax and 
the function Minf). Also no node with a representative cost vector greater than F(n) will 
be selected until the path having the cost vector F(n) is traced. Thus the first new node 
that is expanded after selection of node n is the one having the minimum representative 
cost vector among all nodes generated so far. 

Tie resolutions will occur among nodes having this minimum cost vector. If this cost 
vector is not equal to the cost vector of a non-dominated solution then each of these nodes 
will be expanded by both algorithms. However, if the cost vector equals that of a non- 
dominated solution, then depending on the tie resolutions one algorithm may select the 
solution earlier and prune the other nodes of equal cost. In the worst case situation, the 
solution node may be selected last. □ 

Lemma 3 establishes the equivalence of MOR*0 and MOA** in terms of the worst 
case set of nodes expanded by them. The admissibility of MOR*0 follows from this 
result. 

Theorem 6. Algorithm MOR *0 is admissible, that is, it terminates with all non-dominated 
solutions. 

Proof. The proof follows from lemmas 1 & 3. □ 

6.4 Variants of MOR* 0: 

The main drawback of MOR*0 is addressed by the following lemma. 

Lemma 4. If the number of nodes expanded by MOA** is A, then algorithm MOR *0 
expands O (A 2 ) nodes in the worst case. 

Proof. In the worst case after expanding each new node MOR *0 may have to backtrack 
right up to die source node and regenerate the entire portion of the search space that it had 
pruned earlier before expanding the next new node. Under such a situation it is easy to see 
that MOR *0 will expand O (A 2 ) nodes. □ 

Lemma 4 reflects a standard drawback of many recursive search strategies. Similar 
results have been obtained for algorithms such as IDA* (Korf 1985), prompting researchers 
to develop techniques such as IDA*_CR (Sarkar et al 1991) and DFS* (Rao et al 1991) 
to reduce the number of node re-expansions in recursive search strategies. In this section 
we shall suggest extensions of similar techniques to improve the performance of MOR*0. 
We also discuss a scheme (in the following sub-section) that is applicable only to the 
multiobjective framework. 
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6.4a MOR* 0 with multiple back-up cost vectors: When the algorithm MOR*0 back¬ 
tracks, it backs up only the minimum cost vector (in K-order) from the pruned space. In 
case come solution is found during the course of search that dominates the backed up cost 
vector then the vector becomes useless. Since no other backed up cost vector is available, 
in the worst case the algorithm may have to explore the same portion of the search space 
again. The policy of backing up multiple non-dominated cost vectors has the following 
advantage: 

• If all cost vectors are backed up and all of them are subsequently found to be dominated 
by one or more solutions, then it is possible to prune the portion of the search space 
below that node forever and save a lot of possible node re-expansions. Also if more 
than one cost vector is backed up, then whenever the representative cost vector becomes 
dominated the node may be represented by the next minimum backed up cost vector. 

Backing up multiple non-dominated cost vectors also has the following disadvantages. 

• Since algorithm MOR* 0 proceeds K-ordered best-first, at each node it needs to compute 
only the minimum non-dominated cost vector. Computing multiple cost vectors may 
be expensive for certain heuristic functions. 

• The space constraints may prohibit the backing up of multiple cost vectors. 

Therefore depending on the situation, there is a scope for trade-off between the number of 
costs to be backed up and the advantage gained due to reduction in the number of nodes 
expanded. 

6.4b MOR* 0 with controlled re-expansions: In this section we suggest the extension 
of an established technique from the conventional search model for controlling node re¬ 
expansions in backtracking heuristic search strategies. In the conventional model the fol¬ 
lowing result has been established (Korf 1985) for strategies such as IDA*: 

• If for some constant b (such that b K = N for some integer K), b l new nodes are 
expanded in the ith iteration, then the worst case complexity of the strategy reduces 
from 0(N 2 ) to O(N). 

Based on this result, the algorithm IDA*..CR was developed by Sarkar et al (1991). In each 
iteration i, IDA*-CR performs a depth-first branch and bound with a cutoff value x, which 
is set in way so as to ensure that at least b l new nodes are expanded. To determine the 
cutoff value for the next iteration, IDA*-CR uses a bucketing technique for grouping the 
node costs that exceed the cut-off of the current iteration. 

A similar technique may be used to reduce the number of node re-expansions in MOR*0. 
We may define an iteration to be the set of operations between successive assignments of 
new values to GL. An iteration ends when there are no more paths with cost less than or 
equal to the current value of GL (in K-order). By using a suitable bucketing technique 
it is possible to group the representative cost vectors of nodes which exceeded GL in an 
iteration. The extended algorithm then simply assigns GL a new vector such that in the ith 
iteration GL exceeds (or equals) the representative cost vectors of at least b l new nodes. 
After a new vector is assigned to GL, this algorithm searches the space below the tip node 
of the current path with the new GL as cut-off. The remaining operations are identical to 
that of MOR*0. 
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6.4c The algorithm MOR*: One way of controlling excessive node regeneration is to 
prune nodes only when the given memory is exhausted. In the conventional search model 
the algorithm MA* (Chakrabarti et al 1989) uses such an idea. The algorithm MOR* is an 
extension of MA* to the multiobjective framework with suitable modifications. The algo¬ 
rithm makes use of K-order to prune away the nodes having the maximum representative 
cost vector in K-order. The operation of the algorithm is similar to that of MOR*0 except 
that the pruning and cost back-up mechanism is initiated only when the given memory 
(which is specified through a parameter called MAX) is exhausted. The outline of this 
algorithm is given in appendix B. 

The admissibility’ of MOR* follows from the admissibility of MOR*0. In addition, the 
algorithm has the following property. We state it here without a proof. 

• The algorithm MOR* never uses more memory than MAX+ | C* j, where MAX is the 
given parameter depicting the additional amount of memory available and | C* | is the 
number of nodes in the longest non-dominated path in the search space. 

7. Searching with inadmissible heuristics 

The admissibility and optimality properties of best-first search algorithms such as A* and 
MOA* are subject to the admissibility of the heuristic function. Though an inadmissible 
heuristic function may not be able to guarantee admissibility, search algorithms using 
such heuristics are sometimes found to converge to the solution much faster than weaker 
admissible heuristics. The problem of searching with inadmissible heuristics has been 
well studied within the conventional search model (Bagchi & Mahanti 1983; Dechter 
& Pearl 1985; Pearl 1984). This section presents preliminary results obtained by us on 
multiobjective search with inadmissible heuiistics. 

In order to avoid exhaustive search of the state-space, the following basic policies are 
adopted by strategies such as MOA* and MOA**. 

• The node which is selected from OPEN for expansion must be non-dominated in OPEN. 

• If every cost vector of a node is dominated by the cost vectors of already obtained 
solutions, then the node will never be selected for expansion. 

If the above policies are not adopted then it may be easily shown that the search turns 
out to be exhaustive in many situations. We now show that if the heuristic function is 
inadmissible, then admissibility of the algorithm cannot be guaranteed without relaxing 
the above policies. 

Theorem 7. If the heuristic function is inadmissible then no algorithm is guaranteed to 
find all non-dominated solutions unless it expands dominated nodes also. 


Proof. We prove the result by constructing a problem instance where the expansion of a 
dominated node becomes mandatory. Let us consider the tree in figure 4. At first node s 
is expanded to generate node n i having the single cost vector (6,4) and node nz having 
the single cost vector (4,6). Since F(n\) and F(n 2 ) are mutually non-dominated either of 
them may be selected first by the search strategy. Let us consider both the cases. 
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Figure 4. Multiobjective tree with in¬ 
admissible heuristics. 


1. If n\ is selected, then the goal node «3 of cost vector (3,5) is generated. Since F{nf) 
dominates Finf), m is selected next and the solution of cost vector (3,5) is recorded. 
This cost vector dominates F{ni). Therefore any strategy that adopts the policy of 
not expanding dominated nodes will not expand ni and so will never find the other 
non-dominated goal node of cost (5,3). 

2. If ri 2 is selected prior to n\, then by a similar reasoning the non-dominated solution n-t, 
will never be found unless the dominated node n\ is expanded. 

Thus, unless the search strategy expands a dominated node, it will not be able to find all 
non-dominated solutions. □ 

The above result also shows that depending upon the policy adopted by an algorithm 
to decide which nodes to expand first, different sets of solution nodes will be discovered 
by the search. In this section we analyze the conditions for a given solution path to be 
discovered by MOA**. In the following discussion when the terms greater, less, minimum, 
maximum etc. are used with respect to vectors, we mean that the comparison is on the basis 
of K-order. 

DEFINITION 9 

[pmax ]: We define a function called “pmax” as follows. Given a path P, pmax[ P] returns 
the maximum of the representative cost vectors of all nodes on the path P. We shall also 
refer to the vector returned by pmax[P ] as “pmax of P”. □ 

DEFINITION 10 

[ pmax-ordering ]: We define a strict ordering on the search paths as follows. We shall 
refer to this ordering, as “pmax-ordering”. 

Given any two paths Pi and Pi'. 

1. If pmax[Pi] < pmax[P2] then Pi < Pi based on pmax-ordering. 

2. If pmax[P 2 ] < pmax[Pi] then Pi < Pi based on pmax-ordering. 

3. If pmax[Pi] = pmax[P 2 ] then 

3.1 Let «i e Pi and m e Pi be the shallowest nodes in P\ and Pi respectively 

such that the representative cost vector of ni equals pmax[P\] 
and the representative cost vector of ni equals pmax[Pi]. 

3.2 If «i and «2 are the same node then 

If the node is a tip node then 
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Pi < Pi or Pi < P\ (arbitrarily, based on the tie-breaking rules). 

else 

3.2.1 Let P[ be the subpath of Pi from n\ onwards 
and P( be the subpath of Pi from ni onwards 

3.2.2 If P{ < Pj on pmax-ordering 
then Pi < Pi on pmax-ordering. 

3.2.3 If Pj < P[ on pmax-ordering 
then Pi < Pi on pmax-ordering. 

else when n\ and ni are different nodes then 

Pi < Pi or Pi < P\ (arbitrarily, based on the tie-breaking rules). □ 

DEFINITION 11 

Eligible node: We say that a node n is “eligible” for expansion if it satisfies at least one of 
the following conditions. 

1. Let P m in be the minimum solution path based on “pmax-ordering”. If n belongs to P mirt 
then n is eligible. 

2. A node n is eligible for expansion if it has at least one cost vector f(n) which is non- 

dominated by the cost vector of every eligible solution path P such that pmax[P] is 
less than f(n). A path is eligible if all its nodes are eligible. □ 

Based on the above definitions, the following properties are easy to establish for the 
algorithm MOA** (see appendix A). 

• The necessary and sufficient condition for a node n to be expanded is that all nodes on 
the path P(s, n ) from the start node s to the node n are eligible. 

• If two paths Pi and Pi are eligible, then Pi is generated prior to Pi if and only if 
P\ < Pi in pmax-ordering. 

• MOA** finds the solution paths in the pmax-ordered sequence of the paths. 

If we also relax the assumption that any cost vector of a finite path dominates every cost 
vector of an infinite path (assumption A.3), then infinite paths may have finite cost vectors. 
If such a path is eligible up to infinite depth, then it is easy to see that MOA** will not 
terminate. Taking all these aspects into account, the general conditions of admissibility of 
MOA** may be stated as follows. 

Admissibility conditions of MOA** 

MOA** terminates with all non-dominated solutions iff: 

1. There is no infinite path which is eligible up to infinite depth, and 

2. Every non-dominated solution path is eligible, and 

3. The pmax-ordering of the solution paths describe the same sequence as 

the representative cost vector of the paths based on K-ordering. 

The third condition may be relaxed by modifying step 5 of MOA** as follows: 

5. [IDENTIFY SOLUTIONS] 

If n is a goal node then 

5.1 Put n in SOLUTION-GOALS and its cost in SOLUTION-COSTS. 
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5.2 Remove dominated solutions (if any) from SOLUTION-COSTS. 

5.3 Go to [Step 2]. 

The introduction of step 5.2 becomes necessary because if the third condition is relaxed 
then the solution nodes may not arrive in the K-ordered sequence of their cost vectors and 
it is possible that some solution node entered in SOLUTION_GOALS is dominated by the 
cost vector of some solution found later. 

8 . Conclusion 

In this paper we have addressed two major topics under multiobjective state space search. 
The first has been to refine the basic search scheme proposed by Stewart & White (1991) 
by incorporating algorithmic improvements. We have extended the standard technique of 
pathmax to the multiobjective framework and have shown that in a way it is more significant 
in the multiobjective model than in the conventional model. We have also incorporated 
ideas such as using an induced total order (such as K-order) on the partial order search. 

The other major issue addressed in this paper has been to develop memory bounded 
search strategies. We have shown that the major difficulty in extending standard memory 
bounded search schemes to this model is due to the presence of multiple candidate back-up 
cost vectors in the pruned space, and that the problem can be resolved by using a total 
order on the search. We have presented the algorithm MOR*0 to establish this fact. 
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Appendix A. Algorithm MOA** 

The algorithm outline (for trees) is given below. The set of heuristic vectors at node n 
denoted by H (n) and the set of cost vectors of node n is denoted by F{n). If any dimensi 
of a cost vector in F(n ) is infinity, it would signify that the cost vector is useless, since 
does not underestimate the cost of any finite solution. If all the cost vectors of a node z 
infinity, then the node can be discarded altogether. 

Algorithm MOA ** 

1. [INITIALIZE] 

OPEN s ; SOLUTION-GOALS <t >; SOLUTION-COSTS «- <p ; 

2 * [TERMINATE] 

If OPEN is empty then 

2.1 Output the solutions from SOLUTION-GOALS. 

2.2 Terminate. 

3. [SELECT] ’ 

3.1 Remove the node n in OPEN with the minimum representative cost vector 

in K-order. Let this cost vector be f(n). Resolve ties in favor of goal 
nodes, else arbitrarily. 

4. [DOMINANCE CHECK] 

If the representative cost vector of node n is dominated by the cost 
vector of a solution in SOLUTION-COSTS then: 

4.1 Remove f(n) from F(n). 
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4.2 Seiect die next non-dominated cost vector of n (in K-order). 

4.2.1 If there is no non-dominated cost vector in F(n) then 

4.2.1.1 Discard node n. 

4.2.1.2 Go to [Step 2]. 

4.2.2 Otherwise declare the new vector as the representative cost of n 

4.2.2.1 Return node n to OPEN. 

4.2.2.2 Go to [Step 3]. 

5„ [IDENTIFY SOLUTIONS] 

If n is a goal node then 

5.1 Put n in SOLUTION-GOALS and its cost vector in SOLUTION_COSTS. 

5.2 Go to [Step 2]. 

6 . [EXPAND] 

Expand n and examine its successors. 

6.1 Generate the successors of n. 

6.2 If n has no successors, Go to [Step 2]. 

6.3 Otherwise, for each successor nj of «, do the following: 

6.3.1 Evaluate the non-dominated set of heuristics H(nj) using pathmax. 

6.3.2 Compute the set F(nj) by adding the cost vectors in H(nj) with G(rij). 

6.3.3 Determine the minimum cost vector in F(nj) using K-ordering. 

Declare that cost as the representative cost vector of nj. 

6.3.4 Enter nj in OPEN. 

7. [ITERATE] 

Go to [step 2] 


A.l Admissibility of MO A ** 

This section deals with the admissibility and optimality issues of MO A**. The proper¬ 
ties of MOA** are very much similar to those of the algorithm A* in the conventional 
search domain. Similar properties were proved for the algorithm MOA* by Stewart and 
White (Stewart & White 1991). The objective of this section is to establish the correct¬ 
ness of MOA** and highlight some of the other properties of the algorithm. The following 
assumptions define the domain in which MOA** is admissible. 

Assumption A.l. The heuristic function is admissible. 

Assumption A.l. The number of children of a node are finite. 

Assumption A.3. All non-dominated solutions are of finite length and have finite cost 
vectors, and the cost vectors of infinite paths are dominated by solutions. 

The following two theorems establish the admissibility of MOA**. 

Theorem A.l. If there exists a goal node y in the search space f then MOA** is guaranteed 
to terminate . 
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Proof. We first show that paths are expanded only up to a finite depth. Let us consider a 
node n at infinite depth. By assumption A.3, each of its cost vectors is dominated by the 
cost vector of every solution path. Since the cost vector of the path P(s,y ) is finite, the 
cost vector of y and some cost vector of every ancestor of y dominate every cost vector of 
node n. Until y is selected either y or one of its ancestors must be present in OPEN. Since 
MOA** selects only those nodes that are non-dominated in OPEN and non-dominated by 
SOLUTION-COSTS, it follows that the node n at infinite depth will never be expanded. 

The number of successors of a node are finite (by assumption A.2), which also implies 
that the number of paths up to a finite depth are finite. Since a finite set of paths are 
expanded up to a finite depth, the termination of MOA** is guaranteed. □ 

Theorem A. 2 . MOA** is admissible, that is, it terminates with all non-dominated solu¬ 
tions in the entire search space. 

Proof. When the heuristics are admissible, every ancestor node of a non-dominated solu¬ 
tion node will contain a cost vector that either dominates or is equal to the cost vector of 
the solution node. It follows that every ancestor of a non-dominated goal node is also non- 
dominated. Therefore, until the solution node is found OPEN will contain a non-dominated 
node and MOA** cannot terminate. Since termination is guaranteed by theorem A. 1 the 
result follows. □ 

In this context it must be noted that the admissibility of MOA** is subject to the admis¬ 
sibility of the heuristic function, that is, each ancestor node of a goal contains at least one 
cost vector which dominates the cost vector of the solution. If there exists a non-dominated 
solution y in the search space such that some node n in the path from s to y does not have 
a cost vector which dominates the cost of y, then MOA** is not guaranteed to find the 
solution y. This happens when all the cost vectors of node n are dominated and node n 
is discarded by MOA**. In order to guarantee that all non-dominated solutions are found 
under such situations, an algorithm will have to expand dominated nodes as well. This in¬ 
dicates that brute-force search strategies may be the only strategies that are guaranteed to 
find all non-dominated solutions. However, it is easy to see that under such circumstances 
MOA** is guaranteed to find all those goals whose ancestors have admissible heuristics. 

Appendix B. The outline of algorithm MOR* 

The following is the outline of the algorithm MOR*. The algorithm uses a parameter called 
MAX which indicates the number of nodes that are allowed to be retained in the memory. 
The algorithm uses a variable called node-.count to count the number of nodes currently 
retained in the memory. 

Algorithm MOR* 

1. [INITIALIZE] 

OPEN s ; CLOSED 0; 

SOLUTION_GOALS 0 ; SOLUTION-COSTS 0 ; 
node_count 0 


Heuristic strategies for multiobjective search 


Compute the set of cost vectors of 5 . 

Assign the representative vector of 5 to F(s) and each F/ ( 5 ). 

2. [TERMINATE] 

If OPEN is empty then Terminate. 

3. [SELECT 1] 

3.1 Remove the node n in OPEN with minimum F(n) in K-order. 

3.2 Resolve ties in favor of the goal nodes, else in favor of nodes at a greater depth. 

3.3 Set GL F(n) 

4. [DOMINANCE CHECK] 

If Fin) is dominated by the cost vector of a solution in SOLUTION-COSTS then 

4.1 Compute the next representative cost vector of n. 

4.1.1 If there is no such non-dominated vector then 

4.1.1.1 Discard node n . 

4.1.1.2 Go to [Step 2]. 

4.1.2 Otherwise return n to OPEN and Go to [Step 3]. 

5. [IDENTIFY SOLUTIONS] 

If n is a goal node then 

5.1 Put its cost vector in SOLUTION .COSTS. 

5.2 Output the solution. 

5.3 Set F{n ) 00 

5.4 Go to [Step 2] 

6. [EXPAND] 

6.1 Expand n generating all its successors. 

6.2 For each successor nj of n, do the following: 

6.2.1 Compute the representative cost vector F{nj) of n. 

6.2.2 Assign the vector F(nj) to each F/(«/) 

6.2.3 Set Fj (n) F(n ; ) 

6.2.4 Enter nj in OPEN 

6.2.5 Increment node.count 

6.3 Put n in CLOSED 

7. [SELECT 2] 

7.1 Select the successor m of n with minimum F(m) in K-order. 

7.2 If F(m) < GL then select m for expansion and call it n. 

7.3 Go to [Step 4]. 

8. [COST.REVISION] 

8.1 Put the node m in a set Z 

8.2 Repeat the following steps until Z is empty. 

8.2.1 Remove the node p from Z. Let q be its parent and p be the yth successor 

8.2.2 Set Fj(q) <- F(p) 

8.2.3 Let F m i n iq ) denote the minimum among all Fiiq) in K-order. 

8.2.4 If F(q) < F m i n {q) then 
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8.2.4.1 Set F(q) F min (q) 

5.2.4.2 Put q in Z 

9 . [CONTINUE] 

9.1 If node_count < MAX then Go to [Step 2] 

10 . [PRUNING] 

10.1 Select the leaf node r in OPEN with largest F(r) in K-order. 

10.2 If F\ (r) = GL then Go to [Step 3], 

10.3 Remove r from OPEN and decrement node_count. 

10.4 Let t be the parent of r. If t is in CLOSED then 

10.4.1 Remove t from CLOSED and put it in OPEN 

10.5 Go to [Step 10.1] 
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Abstract. Learning for problem solving involves acquisition and storage of 
relevant knowledge from past problem solving instances in a domain in such a 
form that the information can be used to effectively solve subsequent problems 
in the same domain. Our interest is in the role of learning in problem solving 
systems that solve problems optimally. Such problems can be solved by an 
informed search algorithm like A*. Learning a stronger heuristic function leads 
to more effective problem solving. A set of arbitrary features of the domain 
induce a clustering of the state space. The heuristic information associated with 
each cluster may be learned. We discuss the use of a new form of information 
in the form of h*set (the set of optimum cost values of all nodes of the cluster) 
and present an algorithm for using the information that is more effective than 
A*. A possibilistic (fuzzy set theoretic) extension of this algorithm is also 
presented. This version can handle incomplete information and is expected 
to find solutions faster in the average case with controlled relaxation in the 
optimality guarantee. We also discuss how to make the best use of the features, 
when the system has memory restrictions that limit the number of classes that 
can be stored. 

Keywords. Learning; best first search; features; heuristic estimate; clustering. 


1. Introduction 
1.1 Search and A* 

A search problem consists of finding a sequence of operators to transform a given ini¬ 
tial state to a state matching the given goal description such that the sum of the costs 
of the operators along the path is minimum (Nilsson 1980). For uninformed systematic 
search strategies like depth first search and breadth first search and their variants, the num¬ 
ber of node expansions is an exponential function of the optimum solution length (Pearl 
1984). 
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In order to reduce the number of node expansions, informed search algorithms use 
some domain dependent information of a node, known as a heuristic estimate of the node, 
usually in the form of a function of the node. Under certain assumptions on the value 
of the heuristic estimate, these estimates can be suitably used by search algorithms for 
reduced search complexity. A* is an infonned search algorithm, that uses an estimate 
(underestimate) of the actual solution cost of the node as the heuristic information. A* 
guarantees that the goal found will be the one with an optimum solution cost when the 
heuristic information is an underestimate. If A* has access to perfect information, the 
number of nodes expanded will only be linear in the solution cost. 

1.2 Role of learning 

Most problem solving systems are expected to solve problems repeatedly from the same 
domain and often the same population of problems over their lifetime. In the course of 
problem solving an intelligent system can be expected to acquire additional experience or 
knowledge that will help it to solve subsequent problems of the domain more effectively. 
The time taken by best first search algorithms depends on the accuracy of the heuristic 
estimate. Therefore learning a more accurate cost function associated with a state will help 
to speed up the search process. 

For learning to be effective, we make the following assumption: problems come from, the 
same arbitrary but fixed distribution in the learning phase as in the problem solving phase. 
This is the framework of distribution free learning, and we will claim that if the underlying 
distribution is the same in both cases, with high probability, the information acquired by 
the system in the learning phase will be sufficient to solve most of the problems in the 
problem solving phase effectively. 

1.3 Background 

In traditional heuristic search, a heuristic function is used as a direct estimate of the cost 
of a node. This restricts the types of features that can be used as information. Efficient 
performance of an admissible heuristic search algorithm depends on finding a good easy 
way to compute underestimating feature. Such features are often hard to find. 

A major problem with the use of heuristic functions is that the information content of 
an individual feature may not be very high. Often we come across features that capture 
some aspects of the problem domain only. Several features taken together can characterize 
a problem domain much better than any of the features taken individually by providing 
more discriminatory information. Use of multiple features has been explored by several 
researchers. Given a set of features, best first search algorithms traditionally compute the 
values of the corresponding heuristic functions and combine the values to yield the heuristic 
estimate of a node. The appropriate combination of the different feature values that will 
yield a more informed estimate of a node, while leaving the search procedure admissible, is 
an interesting problem. The standard form of combination is a linear polynomial (Samuel 
1963; Christensen & Korf 1986; Lee & Mahajan 1988; Bramanti-Gregor & Davis 1993). 
Learning in this context is used to determine optimal coefficients of such a polynomial. This 
method may not yield good results when there is no effective linear mapping from feature 
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vector values to the actual cost of nodes. A linear discriminator of the features is very 
restrictive and some form of nonlinearity has been proposed (Samuel 1967). Bramanti- 
Gregor & Davis (1993) present a method of combining features based on linear regression. 
They propose a linear combination of certain ad-hoc nonlinear forms of features. 

Politowski (1986) has used a model where the feature values are used to partition 
the state space into a number of equivalence classes. He advocates that the information 
corresponding to a class be explicitly represented as a mapping from the class to the 
information. This mapping can be learned during the training phase. We adopt this model 
as our basic model. 

1.4 Model 

The state space is first divided into classes by the features. Information associated with the 
classes is learned during the learning phase, and the acquired information is used to solve 
Pf problems more effectively in the problem solving phase. 

Classification 

Following Politowski (1986), we look upon the features as discriminators that determine 
the cost of a node. A feature, h, of a problem domain is a function which can assume a 
number of distinct values when applied to different states of the problem. When multiple 
features of a problem domain are available, a feature set can be used which will provide a 
vector of values for every state. Suppose that we have m features of the problem domain 
k = {hi, hz,, h m ). 

The feature vector values induce a clustering of the state space. Each fc-element vector 
identifies a class containing some number of nodes that are heuristically indistinguishable 
from each other. Consider the set 5^ of all nodes which have the identical value, k as their 
feature vector value. 

5lj; = {n ; hvec(n) = k}. 

We advocate that information be associated and stored with each such set 5^, to be used 
during a best first search algorithm. 

Information associated with a class 

Corresponding to any specific value of the feature vector k at a node, we may store the 
minimum of the h* values of all elements of the class. We may also store the set of h* 
values of the elements of a class, represented by h*set(k). 

h*set( k) = {u : h*(n) = v, n e 5^}. 

f Algorithm 

The h*set corresponding to the value of the feature vector at the node can provide more 
information to an informed best first search algorithm than what can be provided by a single 
underestimate value. We present algorithms CS* , FS* and their variants that use h*set 
and its distribution as information and performs better than A* using the corresponding 
single-valued estimate. 


294 


Sudeshna Sarkar et al 


1.5 Work done 

• Given any set of features, the minimum element of h*set can be used as a heuristic 
underestimate at a node that can be used by the algorithm A*. In this framework, 
we show how the use of multiple features drastically reduces the search time, while 
solving problems optimally. 

• Given a set of features of the problem domain, we show that the h*set corresponding 
to each feature vector value can be learned in the limit by solving a sufficient number of 
example problems in the training phase. We have devised best first search algorithms 
that make effective use of the information of the entire h*set to further reduce the 
number of node expansions. 

• However these algorithms assume complete information about the h*set corresponding 
to every feature vector value. A bounded risk model has been devised that will always 
find some solution, and is guaranteed to find an optimum solution with a specified con¬ 
fidence level. The algorithm uses the possibility distribution of the individual elements 
of the h*set. 

While a lot of stress has been laid in previous work on the importance of accuracy 
and precision as the necessary quality of a good heuristic estimator, making use of the 
discriminatory power of the heuristic feature set has been largely overlooked. We lay 
special emphasis on devising more effective best first search algorithms by making use of 
the discriminatory power of a set of features that may help to make a distinction between 
on-track and off-track nodes. 

This paper is organized as follows: 

Section 2 provides the basic notations and definitions. Section 3 discusses the learning 
phase. Section 4 presents admissible search algorithms for effective use of the information 
learned. Section 5 provide the corresponding algorithms for the fuzzy set model. Section 6 
concludes the paper. 

2. Features, feature vectors and h*set 

Let S be the set of states of the problem space. The problem space has a set of operators 
that change the state of the problem. For example, in 8-puzzle, the states are the different 
permutations of tiles, and the operators are single step moves of the blank position. Cor¬ 
responding to each application of an operation in moving from state s; to state sj, a cost, 
c(si,s,) is associated. 

A search problem instance is an element of V, where V = <S x S. In other words, a 
search problem is a 2-tuple (s, y). s is called the start node, and y is the goal node. A 
problem can also be of the form (s, T) where F is a set of goal states. In optimum search, 
the problem solving task is to find a sequence of operators that map the initial state to one 
of the goal states such that the sum of the costs of the operators along the path is minimum. 
First we provide certain preliminary definitions: 

s : A node designated a start node for a specified problem. 

7 : A goal node for a specific problem. 
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r : A set of goal nodes for a specific problem. 

(ni|, np : Arc connecting the node and the node nj. 

c(nj, nj) : Cost of the arc connecting the node n, and the node nj. 

P = (no, nj,..., %) : Path in a graph consisting of a sequence of arcs connecting no 
through n*. 

P | n : If P = (no,«i, • P I n = («o,«i, ...,n*,n). 

P ni -n 2 : A P a lh fromni to «2- Pm-n 2 = («i, • • •, m)- 

PP s _n : Pointer Path in a graph obtained by tracing the pointers back from n to s in the 
current search tree. It is the cheapest known path from s to n at any given point during 
the search. 

C p (P) : Cost of path P. If P = (no, n \,..., nk), C p (P) = E/=o -1 c(«i, n i+ \). 

Cp( n i, n 2 ) : Cost of minimal cost path from n\ to «2- 

g*(n) : Cost of cheapest path from the implicit start state s to the node n.g*(n) = C*(s,n). 

g(n) : Cost of cheapest known path from the start state s to the node n at any given point 
during the search. 


g(n) — minp(C p (P)) V P = (s, 


In other words, g(n) = c p {PP s - n ). 

h*(n) : Cost of cheapest path from n to the goal state y of the given problem. h*(n) = 
c* p {n, y) where y is the implicit goal. It may be noted that the goal may vary from 
problem to problem. However, for notational convenience, y has been dropped. 

f (n, P) : Estimate of cost of cheapest solution path which is an extension of path PP s - n - 

h : A feature of the problem domain. 

t 

k = (hi, h 2 ,..., hfc) The set of features used (feature vector), h \, h 2 , ■ ■ ■, hk are features. 

v = (vj, V 2 ,..., vjj) Instantiated value of the feature vector, where v; is the value of hi 
for the given state. 

hvec(n) : The feature vector value corresponding to node n. hvecin) = (ui, U 2 , • • •, v k) 
if Vi = hj(n). 


2.1 What are features ? 

A heuristic feature is traditionally an estimate of the cost of an optimum cost path from 
the current state to the goal state of the current problem instance. We consider a feature to 
be any function of the goal state y, the start state s, and the partial search tree generated 
so far. However, for convenience, we will use h(n) to denote the value of a feature h at 
noden. 


296 


Sudeshna Sarkar et al 


2.2 Role of features in clustering 

We will show how a set of features can be effectively used to strengthen the heuristic 
estimate of a node. We look upon features of the problem domain as a means of dividing ~|j 
the problem state space into a number of classes. Increase in the number of features leads 
to further refinement of the classes, each original class being subdivided into a number 
of classes based on the values of the new feature set. Instead of associating information 
with a state, information can be associated explicitly with every class. We shall see that 
with the types of information that we consider, the more refined the classes, the greater 
the accuracy of the information. 

Example 1. Consider the 8-puzzle problem. The number of distinct states is about 181440 
(9! /2). The misplaced tiles heuristics takes on 9 different values 0,..., 8, and divides the 
state space into 9 different classes. Similarly the Manhattan distance estimate divides the 
state space inti) about 25 different classes. Taken together, they divide the state space into 
about 78 di fferent classes. The information associated with these subclasses induced by a ”4 
set of features is stronger than the information that can be associated with classes induced 
by each of the component features. □ 

Note that the above scheme gives rise to a finite number of classes only for features 
w Inch take on a finite number of distinct values. To be able to handle features that take on 
continuous values, or a large number of values, we will have to divide the feature values 
into intervals. The feature vector in this case will be a vector of intervals, rather than a 

vector of values. 

2.4 minh and h*set 

Alter classification, some information must be associated with the classes, which can be 
used as the heuristic information for the nodes belonging to the class. Search algorithms use T 

estimates of the optimum cost as the heuristic information. A class may contain nodes with 
several different optimum cost values. The simplest form of information is the minimum 
possible cost value corresponding to a class, denoted by minh. We introduce the concept 
of h\u t associated with a class. 

DEFINITION 1 

The minh of a given feature vector value k is the minimum of the h* values of the nodes 
comprising the class identified by the feature vector value k. 

minh(k) = min„{h*(n)} Vn s.t. hvec(n) = k. 

DEFINITION 2 * 

The h"set of a given feature vector value k is the set of feasible optimum cost values i.e„ \ 
/,*{«! for all nodes with feature vector value k and is denoted by h*set{ k). 

There will exist many nodes m, n 2 , »3. • • • which have the same feature vector k i.e., 

) = hvec{n 2 ) = hvec(nf) — ■ ■ • = k. 
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Table 1. Examples of h*set in 8 -puzzle. 


Feature set 

Feature vector value 

minh 

h*set 

(hi) 

( 10 ) 

10 

10 12 14 16 18 20 22 

(h\, hf) 

< 10 , 8 ) 

10 

10 12 14 16 18 

(h\, hi, hi) 

( 10 , 8 , 8 ) 

12 

12 14 


(10,8,9) 

10 

10 12 16 


( 10 , 8 , 11 ) 

10 

12 14 


( 10 , 8 , 12 ) 

16 

16 


(10,8,13) 

14 

1416 


(10,8, 15) 

18 

18 


These different nodes may have different values of h*. The h*set corresponding to the 
feature vector is the set of the h* values for all nodes with that feature vector i.e., 

h*set(k) = [J/i*(n) Vn s.t. hvec(n) = k. 

n 

Corresponding to every element x of h*set( k), there will exist some node n with feature 
vector value k and optimum solution cost = x . 

Note that minh(k) = min{h*set(k)}. 

Given a node n, we will, for convenience, refer to the h*set at the node , n to mean the 
h*set corresponding to the feature vector value at the node ( h*set(hvec(n ))). 

Example 2. In table 1 we show some typical examples of h*set values in the domain of 
8 -puzzle. The example illustrates how the addition of more features induces a finer cluster¬ 
ing of the state space, and stronger information may be associated with these subclasses. 
The heuristic features used are 

• Manhattan distance 1 represented by h\, 

• Misplaced tiles 2 represented by hi, 

• Sequence count 3 represented by ft 3 , □ 

3. Learning phase 

This phase must precede problem solving. In this phase, the mapping from feature values 
at states to the corresponding estimates are learnt. In this section, we discuss the number 
of problems that must be solved in this training phase for effective problem solving using 


Manhattan distance: It is the sum of distances that each tile is from home and is denoted by manhattan. 
2 Misplaced tiles: The number of tiles not in their correct position as in the goal state. 

3 Sequence count: It is obtained by checking around noncentral squares in turn, allotting 2 for every tile not followed 
by its proper successor, and allotting 0 for every other tile; a piece in the centre scores 1. We denote this by sequence 
count. 
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this information. In Bramanti-Gregor & Davis (1993), the mapping function from features 
to estimate is assumed to be linear. If a close linear fit does exist to the actual function, the 
equation of the line can be obtained by using a few examples for training. We need to learn 
the function without assuming that the function is to be approximated to any particular 
form. 

Let (hi, hi ,..., hk) be a given set of features of the domain. The system solves rep¬ 
resentative problems in the domain, using some guaranteed admissible search algorithm; 
and learns the full range of feasible h* values (optimum cost from a node to the goal node), 
h*set( k), corresponding to all nodes having the feature vector k. 

For a particular start state s, a necessary condition that our algorithm will terminate 
with an optimum solution is that there exists an optimum cost path from 5 to y, such that 
corresponding to every state in the path, some state was encountered in the learning phase 
with the same feature vector value and the same value of h*. 

We define S(h,-, h*) to be the equivalence class of states containing all nodes n s.t. 

Vn € S (h; h*), hvec(n ) = h, & h*(n) = h*. 

Let P*_ y be an optimum path from s to y, given by 

P = (no = s,n\,n 2 , ■ ■-,n k -y). 

The necessary condition that this path will be found in the problem solving phase is: For all 
i,0 < i <k, at least one member of S(hvec(ni), h*(nt)) was encountered in the problem 
solving phase. 

Suppose we are working with a feature set < h\, /i 2 ,..., h k >, which induces M 
clusters in the state space. These clusters are represented as Co, Cj,..., Cm. We run a 
number of example problems and obtain T points which are used for learning. We assume 
a model where the examples come from any arbitrary distribution, from a given population 
of problems. The distribution from which problems come from is assumed to be the same 
in the problem solving phase as in the learning phase. To be able to learn the class Ci it 
is required that the set of T sample points contain some examples of class C;. Suppose 
the probability distribution is not uniform, but states occur according to some arbitrary but 
fixed distribution D + . In this case perfect learning of the h*set is guaranteed only in the 
limit. But suppose we relax the guarantee of completeness of learning — and only require 
that with confidence > (1 — S) we want to get the optimum solution. The class C; will be 
correctly learnt if 

• for A* ( minh ), an example is found where a node belonging to class C, has its h* value 
as minh(hvec(Ci)). 

• for CS* : for each x in h*set(Ci), an example is found where a node belonging to class 
Ci has a h* value of x. 

The class Ci will be approxiamtely correctly learnt if 

• for A*(minh), an example is found where a node belonging to the class Ci maps to the 
h* value of e cut of the distribution corresponding to hvec(Ci). 

• for CS* : for each x in h*set(Ci) where the probability of the point x is > s, an example 
is found where a node belonging to class C,- has the h* value of x. 
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Let the probability of occurrence of a class be > e. Let N be the maximum length of an 
optimum solution path; and let X be the number of trials necessary for the {<$) guarantee. 
Let the number of sample points be T. Then the probability that at least one of the classes 
containing some state in the optimum path was not encountered in the X trial phases is 
N(l - e) x . The 8 confidence guarantee means that 

N(l-e) x <8 
=» Ne~ sX < S 

i.e„ e sX > N/S 

or, X>(N/e)log(N/8). 

Therefore if we want to find optimum solutions with a (5) guarantee, it suffices to have 
collected data for h*set from 0(j log -j) random examples from the same distribution as 
the example problems. 

Let us look at classes with probability < s. 

Maximum probability that a class falls in one of these classes = Te. 

The probability of that at least one of the states on the solution path has probability < e 
is given by = TNe. 

We want this probability to be less than 8. 

This is true if e < 8/TN, i.e., 1/e > TN/8. 

Therefore, if we look at 

(TN/8) log(M/5) 

examples, we can ensure that the probability of failure will be < 8. 

Once we are able to learn the information corresponding to every feature vector value, 
we are ready to use it to solve future problems more effectively. 

4. Perfect information model with crisp h*set 

In this section, we discuss algorithms making use of the information learnt to perform 
admissible search. In § 4.1, we discuss using of minh as the heuristic evaluation function 
in A*. Results are provided for the domain of 8-puzzle which serves as a benchmark for 
comparison of the performance of the other algorithms. 

4.1 Using minh as information 

The simplest type of information that we can associate with every class is an estimate of 
the cost value of the nodes belonging to the class. To be able to run algorithm A*, the best 
underestimate that can be associated with any cluster, C\, with feature vector value ki, is 
the minimum of the h* values of the nodes comprising the cluster. This is the scheme used 
by Politowski (1986). It attempts to capture the full information available by all the given 
features. 

We have run several randomly generated examples in the domain of 8-puzzle using a 
guaranteed underestimating heuristic function. From the solution paths obtained, we have 
computed the minimum element of the h*set corresponding to each feature vector. This 
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Table 2. Results of running A* with minimum of h*set for various feature sets. 


Features used 

Manhattan 

Manhattan 
misplaced tiles 

Manhattan 
sequence count 

Manhattan 
misplaced tiles 
sequence count 

% Nodes expanded 

100 

81 

77 

42 


value is then used in the second phase as the heuristic estimate of a node in algorithm 
A*. 10000 different random examples were then run with this new heuristic estimate. We 
observed that the number of node expansions was reduced from that of A* with Manhattan 
distance heuristics. All the solutions obtained were optimum. The results of running the 
algorithm after learning with different feature sets are presented in table 2. The table 
illustrates how the combination of different features drastically reduce node expansions. 
Note that though Misplaced tiles heuristics is dominated by the Manhattan estimate, using 
both the estimates together gives rise to significant improvement in search performance. 

4.2 Using h*set as information 

Our objective is to find out if other stronger forms of information can be associated with 
the classes, such that they can be effectively used for more effective searching. If a best 
first search algorithm uses a single value as information, minh is the best estimate that 
can be associated with a class to guarantee admissibility. However, if the set of feasible 
optimum cost values of the states comprising the class is associated with a class, then 
stronger reinforcement to the heuristic estimate of a node can be made by using the path 
information. In this section we present algorithm CS* that makes use of the information 
of h*set to make search more effective. 


4 .2a Some definitions: If the h*set is known for each class, the possible values of 
optimum solution costs of the problem that can be obtained by extending the current path 
is given by fset which represents the set of feasible values of the optimum solution cost 
of a path that is constrained to pass through n with a g value equal to the cost of the current 
pointer path. 

DEFINITION 3 

The fset of a node n, given the pointer path P from s ton, and the value of h* set at the 
node (i.e., h*set(hvec(n ))), is given by 

fsetin, P) = [x \ x = g(n) + y Vy € h*set(hvec(n))}. 

DEFINITION 4 

We define the minf value of a node n, minfin, P) to be the minimum element of the 
corresponding fset. 


minfin, P) = min {ifset (n, P'))}. 
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We make use of the following property : if C* is the optimum solution cost from node 
s to a node y, then if a node n lies on any optimum solution path from 5 to y, the sum 
of the optimum cost from start node 5 to the node n and that from node n to a goal node 
y equals C*. This implies that the intersection of the fset of nodes lying on an optimum 
solution path must be non-empty. We introduce the notion of pathint which has a parallel 
in the concept of pathmax (Mere 1984; Dechter & Pearl 1985) in best first algorithms 
using single valued estimate. 

DEFINITION 5 

The pathint of a node n along a pointer path P = (no, n \,..., nf), is defined to be the 
intersection of the / sets of all nodes belonging to the path. It is denoted by Pset(n, P). 

k 

Pset(n, P) = P| fsetini, (no, «i, • • •. «i». 

i=0 

This means that if C* is the optimum solution cost, it will be a member of the pathint of 
all nodes on any optimal cost solution path. Retaining the Pset at every expanded node 
results in better selection of nodes for expansion than what is provided by A*. 

DEFINITION 6 

We define the minP value of a node n along a path P, minP(n, P) to be the minimum 
element of its Pset. 


Example 3. The above definitions are illustrated below with reference to the example graph 
in figure 1. 



{1,7} 


{ 1 , 8 } 


Note: The set shown against each node denotes the value of h*set at the node. 


Figure 1. A search graph. 
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fset(n\,(n\)) — {4,6,11} Pset(n\, (n\)) = {4,6,1 

fset{n 3 ,{n\,n 3 )) = {4,7,8,11} Pset(n 3 , (n\,n 3 )) = {4,11} 

fset(n 2 , («i,« 2 » = {5,6,7} Pset(n 2 , («i, n 2 )> = { 6 } 

fset(n 3 , (ni,«2,«3)) = {3,6,7,10} Pset(n 3 , (ni,n 2 ,n 3 )) = { 6 } 

fset(n 5 , (m,n 2 ,n 5 )) = {3,5,10} Pset(n 5 , («i, n 2 , n 5 )) = 0 

minf(n 2 , (n\,n 2 )) = 5 minP(n 2 , (n\,n 2 )) - 6 


DEFINITION 7 

The upper bound to the solution cost obtained at node n denoted by ub(n, P ) is 
maximum cost of an optimum solution path constrained to be an extension of path P. 
any point during the search process, when the set of nodes <S have been expanded, mir 
is defined to be the minimum of the upper bound of the nodes. 

minub = PP s - n )}. 

In this section, we outline a best first search algorithm which uses the informat 
about the h*set to direct its search process. We show that the use of pathint with h* 
provides a more informed and stronger pruning criterion than pathmax using a sin 
valued estimate. We outline algorithm CS* that uses h* set as state information and patf 
for path information. It is guaranteed not to be worse than algorithm A* using sim 
information. We also present the iterative deepening version of the algorithm, CIDS* 
addition to making use of pathint to convey path information, it uses front-union to tram 
additional information from one iteration to the next. 

4.2b Algorithm CS*: Observation 1. If an extension of the path P s -n* is an optim 
solution from s to a goal y, then 

C* 6 fset(n*, P s -n*)- 

Suppose node n 2 is a child of node n\. A necessary condition that the arc {n\,n 2 
included in an optimum cost solution path is that C* e fset{n\, PP s - ni ) and C* 
fset(n 2 , PP s -n 2 ), which is possible only if the intersection of these two fsets is n 
empty, because, we must have 

C* efset(n\, PP s -m) f]fiet(n 2 , PP s -„ 2 ). 

When a node n* is generated, let PPs-n k be the pointer path from 5 to n*. We note th: 
an optimum solution path can be reached by expanding the node n k along this path, t 
the optimum solution cost C* must be a member of all nodes included in the pointer pi 
This leads to the following observation. 

Observation 2. If the pathint at a node becomes empty, that branch of the search tree 
be pruned as it is guaranteed not to lead to an optimum solution. 

We may therefore consider retaining the pathint information at every node. The cone 
of pathint corresponds to the concept of pathmax (Dechter & Pearl 1985) in traditional 
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search using a single estimate at a node, hut provides a much stronger pruning criterion. It 
enables us to remove from consideration those values of feasible optimum solution costs 
of the path such that there exists at least one node along the path whose set of feasible 
optimum solution costs does not contain that particular cost. 

Observation 3. The minP value of a node can be effectively used as the estimate of a node 
during the node selection process. 

We outline the algorithm CS* below which is a best first algorithm that uses minP as the 
heuristic estimate at a node. 


Algorithm 

With every node we associate the following information: 

(n,fset(n, P),Pset(n, P),parent) 

1. [INITIALIZE:] minub = MAXh* ;; The maximum cost of a solution in the domain. 

OPEN (s,fset(s, (s)),Pset(s, (s))) where fset(s, (s)) = Pset(s,(s)) = h*set 
(. hvec(s )) 

CLOSED *- <f> 

LOOP: 

2. [TERMINATE:] If OPEN is empty, exit with failure. 

3. [SELECT:] Select the node from OPEN with minimum value for minP(n, P s - n ) 

4. [CHECK FOR GOAL:] If n is a goal node, exit with solution, PP s - n - 

5. [EXPAND and PRUNE:] Expand node n, generating the set M of its successors, and 

attach to them pointers back to n. 

For all m € M, 

If mi OPEN or CLOSED 

add the tuple ( m,fset(m , PP s - m ), Pset(m, PP s - m )) to OPEN if PsetQ f <p. 
If me OPEN or CLOSED 

direct its pointers along the path of lowest value of g(m). 

If m required pointer adjustment and was on CLOSED, reopen it. 

If max{fset(m, PP s ~ m )} < minub, 
set minub = max{fset(m, PP s - m )} 

Remove from OPEN those nodes whose lower bound > minub 

6. [CONTINUE:] Go to LOOP. 


Computation ofPset(n,PP s -„ ): If n,- and nj be on an optimum path from 5 to g, 
f* € f set (ni, PPs-m ) n f set (tij, PP s - nj ). 
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The pathint for a path is computed incrementally as follows: 

Pset(s, ( s )) = fset{s, (s)), 

Pset(n, P) = Pset{m, P') D fset(n, P) If P = (P' | n). 

Example 4. We will illustrate the algorithm by showing how it works for the example 
graph of figure 1. s = n\ is initially on OPEN. It is put on CLOSED, and its successors 
nyiPseti) = (6)), ny ( Pset( ) = (4,11)) and n^PsetQ = (4,11 )) are generated and put in OPEN. 
At this point, minub = 7 (from node nf). 

«3 is first selected for expansion(mmP(n 3 , (n \, ns)) = 4). Its successors are y(PsetQ = 
</>) and nj(Pset() = (11)). y is not put in OPEN because its Pset is empty and n-j is not put 
in OPEN because its minf value(lO) exceeds the value of minub( 7). 

«4 is next expanded with a minP value of 4, and ns is generated with Pset = (4). Its 
minP value is 4, and it is the next candidate for expansion. The successor «8 is pruned 
because its Pset is empty. Now «2 is expanded, ns is pruned because its Pset is empty, 
and «3 is put in OPEN with Pset = {6}. ns is now expanded, ny is pruned(empty Pset), 
and y is put in OPEN, y is next selected for expansion, found to be a goal node, and the 
solution path (n\, ny, ns, y) is output. 

To summarize, nodes are expanded in this order: n\,ny,, « 4 , ns, « 2 , «3, Y- It may be 
observed that the nodes expanded by the algorithm A* using minf as the evaluation function 
at anode areni, « 4 , ns, nj, ny, ns, ns, n%, y. □ 


4.2c Properties of CS*: 

Lemma 1. If a node n* expanded by CS* lies on an optimum path from s to a goal node 
y € T, then C* e fset(n*). 

Proof. The result follows from observation 1. □ 

Lemma 2. The value of minub provides a global upper bound to the optimum solution cost 
value C* for the given problem. 

Proof, minub = max {/ set(n, P)} for some node n, and some path P s - n - Therefore, 
minub = gin, P) + x where x = max { h*set(hvec(n ))}. A finite value of x implies 
that a finite solution path must exist from node n to a goal node y with optimum solution 
cost < x. Since we have already found a path of cost gin, P) from s to n, there must exist 
a solution path from s to y of cost < gin, P) + x — minub. □ 

The value of minub can be used to restrict the Pset value of a node by removing from 
consideration those values that are greater than minub. 

COROLLARY 1 

If P is a path from s to n, and minPin, P) > minub, P cannot be part of an optimum cost 
path from s to goal y. 
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While this has no effect on the number of nodes expanded, this property enables us to 
prune from the OPEN set. 

Lemma 3. At any time before CS* terminates, there exists on OPEN a node n,- which is on 
an optimum path Pf_ y , s.t. C* € Pset(ni, P s -m )• 

Proof The proof is similar to that of A* (Nilsson 1980). □ 

COROLLARY 2 

Ifni is the shallowest node belonging to an optimum solution path P*_ n /, then g{n{) = 
**(«/). 

Note. Note that our evaluation function at a node is path dependent, and the same node may 
be expanded more than once. However it is ensured that the minP values of the expanded 
nodes are non-decreasing. 

Theorem 1. Algorithm CS* is admissible. 

Proof. Suppose the algorithm terminates with goal node y e V and solution cost g(y) = 
minP(y, PP s - y ) > C*. When t was chosen for expansion, 

minP(t, PP s -t ) < minP(n , PP s - n ) Vn e OPEN. 

Therefore immediately prior to termination, all nodes on OPEN satisfied the condition 

minPfn, PP s - n ) > minPlt , PP s - t ) > C*. 

But according to the previous lemma, there must have existed on OPEN, just before 
termination, at least one node n! on an optimum path with C* e Psetfn ', PP s ~ n >) i.e., 
minPfn ', P) < C*. This is a contradiction. Hence algorithm CS* must terminate with an 
optimum solution. Therefore it is admissible. □ 

The minP value of a node obtained by using pathint can be effectively used as the estimate 
of a node during the node selection process and provides a stronger estimate than mm/that 
makes use of pathmax. 

Lemma 4. CS*(h) never expands more nodes than algorithm A*(h) that uses minfas the 
heuristic function. 

Proof. For any node, n, selected for expansion by CS* , minP(n, P) < /* if n lies on an 
optimum solution to the goal along path P. 

Because Psetfn, P) c fsetin, P), minffn, P) < minP(n, P) < f* A*(h) expands all 
nodes n such that f(n) < c* = /* 

CS* (h) does not expand any node n such that fin) > c*. 

Also, if A* and CS* use the same tie-breaking rule, CS* does not expand any more nodes 
whose f inode) = c* than A* does. 

In fact A* is a special case of CS* such that A*(/i) = CS*fh') where h*set-in) is the 
continuous range from h(n) to oo. □ 
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In some cases CS* expands less nodes than A* The extra pruning in CS* can be attril 
to these two factors: 

1. By using pathint, it takes the intersection of the possible values of / for every 
along a path. For a good set of features h, the pathint for off-track nodes may be ei 
This results in pruning of these paths. 

2. By using a global Upper Bound on the value of /*, it is possible to further tighte 
value of pathint at nodes, which is effective in reducing the size of the OPEN set 

Size of OPEN: Let T be the frontier set of the implicit search tree generated by A* 
the set of all tip nodes. In the case of A*, OPEN contains all successors of T that ai 
members of T. In CS* , OPEN contains only those successors of T s.t. Pset(n, P) 
and minP{n , P) < minub. 

Also we have proved that any node expanded by CS* is also expanded by A*, 
means that the largest size of the OPEN set for algorithm CS* cannot be more than tl 
A*, but it can be less. 

4.2d When is a heuristic determiner more powerful than another? In the general 
it is difficult to evaluate the relative merits of two heuristic determiners, hi and ho. Lo 
speaking, hi is better than h 2 if hi discriminates between on-track and off-track i 
more effectively. However it is possible to make some simple observations: 

Lemma 5. A heuristic determiner hi dominates a heuristic determiner h 2 if 
Vk ft*sef/,,(k) c h*seth 2 (it2) 

if h* sethj denotes the h*set when using hi as the feature set. 

We have assumed that the cardinality of h* set is finite and reasonable so that it is po 
to store the set directly. If h* assumes real values, it may be the case that the range of 
is continuous. In this case we can just keep the lower bound and upper bound of the i 
This is equivalent to using the lower bound (the value of minf) as the heuristic estima 
using the upper bound to compute minub. A suitable representation can also be des 
if the range of h*set is piecewise continuous. 

4.2e Experimental results: In table 3 we show how running of the CS* algorithm 
the information of h*set corresponding to each feature vector value can further i 
node expansions. The results can be compared with that of table 2. 

4.3 CIDS* - Iterative deepening version 

Consider the iterative deepening version of the above algorithm. In an iterative dee] 
algorithm, a threshold is selected for the current iteration and depth first search is ex 
until the estimates of the frontier nodes cross the threshold. For an iterative dee] 
algorithm, at the end of any iteration, the OPEN list defines the frontier of the ii 
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Table 3. Results of running CS* with h*set for various feature sets. 


Features used 

Manhattan 

Manhattan 
misplaced tiles 

Manhattan 
sequence count 

Manhattan 
misplaced tiles 
sequence count 

% Nodes expanded 

100 

75 

69 

35 

Number of classes 

25 

78 

168 

403 


search tree generated, which consists of all the nodes that have been generated, but not yet 
expanded. 

DEFINITION 8 

We define the frontier set at the end of the ith iteration of an iterative deepening algorithm 
with cutoff /' to be the set of nodes T, such that the minP values of these nodes exceeds 
the threshold for the current iteration (/'), while the minP value of their parents lie within 
the threshold. 

Thus the set T defines a cut such that the start node s lies on one side, and any goal y, 
if it exists, lies on the other side of the cut. If there exists a solution to the problem, any 
optimum path to the solution must pass through at least one node included in the cut. Also, 
when the shallowest such node, m, was expanded, the g-value of the node was g*(m) (by 
corollary 2). Therefore the Pset value of the node m will include the optimum solution 
cost. This motivates us to define the concept of front-union. 

DEFINITION 9 

The front-union of the ith iteration of the iterative deepening CS* algorithm is defined to 
be the set of / values in the union of the pathint values of all nodes defining the cut. 

Funion(ffi) = Pset(n, PP s - n ). 
ner‘ 

CIDS* is an iterative deepening algorithm that makes use of front-union. It is an ex¬ 
tension of the IDA* algorithm, with pathint calculation and pruning by minuh like that of 
CS* . At the end of an iteration, the front-union is computed. This value is backed up as 
the Pset of the start node s at the beginning of the next iteration. This enables the wisdom 
gained in an iteration to be propagated to subsequent iterations. We present our justification 
of the algorithm below: 

A very important property of front-union is that if there exists a solution to the problem, 
the front-union must contain the optimum solution cost. 

Lemma 6. If there exists a path from s to a goal y e T, and a goal has not yet been found 
after the ith iteration of CIDS*, the optimum solution cost C* must be an element of the 
ith front-union set fr. 
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Proof. We have noted that Ti defines a cut in the search space. If there exists a path 
from 5 to a goal node y e r, the ith front-union set must intercept one of the nodes 
on the optimum solution path. Let that node be m. We note that when the node m was 
selected for expansion, the optimum solution path to the node m has been found. There¬ 
fore g(m, PP s - m ) = g*(m). h*(m ) e h*set(hvec(m )) by definition of h*set. There¬ 
fore /*(m) € f set (m, PP s - m ). Since an extension of PP s -m is an optimum cost path, 
f*(m) € Pset(m, PP s - m ). Therefore C* € Pset(m, PP s - m ). 

3 m € Ti s.t. C* € Pset{m, PP s - m ). 

Therefore, C* € Funion{T{). I - 1 


The above lemma implies that, for subsequent iterations, a cost value which is not included 
in Funion(Fi) need not be considered. This means that for the next iteration, we can set 

Pset‘ +l (s, { s )) = Pset l {s, {s)) H Funion(F l ) 

But Funion(F l ) C Pset 1 (s). Therefore 

Pset l+l (s) = Furii on(fF l ) 

Example 5. We show how the algorithm CIDS* works for the example graph of figure 1. 

• At the beginning of the first iteration, Funion = (4,6,11), threshold=4. Nodes expanded 
are 

1. n\, path = («i), Pset = (4, 6 , 11), 

2. n 3 , path = (n\,nf), Pset = (4, 11), 

3. « 4 , path = («i,n 4 ), Pset = (4,11), 

4. « 5 , path = («i, n 4 , nf), Pset — (4, 11). 

The nodes that define the frontier at the end of the 1 st iteration are: 

1. n 2 , path = n\, n 2 , Pset = ( 6 ), 

2. y, path = ni,n 3 ,y, Pset = 0, 

3. nj, path = n\, 713 , n-j, Pset = <p, 

4. ng, path = n\, / 14 , n$, ng, Pset = <p. 

Therefore T x = {nz, y, ny, ns) and Funion(F l ) = ( 6 ). 

• At the beginning of the second iteration, Funion — ( 6 ). 

Threshold = min {minP(n)} Vn 6 Funion(F\ ) = 6 . 

Nodes expanded are 

1. n\, path= n\,Pset — (4, 6 ,11), 

2. « 2 , path = (ni, nf), Pset = ( 6 ), 

3 . 713, path = (ni, 7 i 2 ,«3 ),Pset = (6), 

4. y, path = 711 , 712 , 713 , y, Pset — ( 6 ). 
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Table 4. Nodes expanded by CIDS* in the example of 
figure 1 . 


Iteration 

Threshold 

Nodes expanded 

Funion(Ti) 

# 


by CIDS* 


1 

4 

ni,n 3 ,n 4 ,tt 6 

{ 6 } 

2 

6 

«1,«2,W3, y 




The maximum nodes that can be generated are: 

1. / 15 , path = («i, tis), Pset = 4>, 

2 . « 7 , path = («i, « 2 > n 3> « 7 ), Pset = <f>, 

3. « 3 , path = (n i, « 3 ), = <t>, 

4. « 4 , path = («i, / 14 ), Pre/ = 0. 

In table 4 we show the nodes expanded by algorithm CIDS* for the example in fig¬ 
ure 1. For comparison we have also presented the nodes expanded by algorithm EDA* 
which makes use of minf as the evaluation function at a node in table 5. We note that 
CIDS* expands nodes /14 and ng in the first iteration, but they are not expanded again in 
the second iteration. □ 

The concept of front-union is often useful in being able to restrict the pathint set of 
the start node from information that is deeper down the tree and hence expected to be 
more accurate. This may prevent some nodes that are expanded in one iteration from being 
expanded/generated in subsequent iterations, as illustrated by the above example. 


_ 5. 6-Risk algorithm using fuzzy set model 

I" p 

The algorithm presented above is not robust in the presence of incomplete information. 
Only a completely error-free heuristic determiner can guarantee termination with an opti¬ 
mum solution. In our learning scheme, this cannot be ensured except in the limit. Also, in 
order to guarantee admissibility, the algorithm uses an over-cautious approach in selecting 
nodes for expansion. In the learning phase, it is possible to acquire statistics about the 
distribution of the values of h*set . This information can then be fruitfully used to make 
a better decision about which node to expand next at the risk of missing the optimum 
solution sometimes. However the risk factor can be fed as a parameter to the algorithm. 



Table 5. Nodes expanded by IDA* in the example of figure 1. 
Iteration # Threshold Nodes expanded by IDA* 


1 

2 

3 


4 

5 

6 


ni,n3,«4.n6 

n\, « 3 , «4, ne, ns, « 2 , n 3 , n 5 
ni,n 3 ,/t 4 ,n 6 ,ns, n 2 ,n%n s , y 
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We find that if we insist on solving problems faster on the average while using the san 
features, we have to compromise on the solution quality obtained. However, we usually lil 
to have some control over the deterioration of the solution quality. We describe a schen 
where this poorness of quality can be parameterized so that we have a handle on the quali 
of the solution cost. In addition to learning the minimum value of h*set corresponding 
every feature vector value, we may learn all the elements of h*set and their distributk 
while solving problems. A typical h*set distribution in the domain of 8-puzzle is shov 
below. 


hi 

hi 

h 3 



h *set distribution 







frequency 

0.024 

0.138 0.289 0.407 

0.119 

0.020 

0 .0( 

16 

7 

16 

h* value 

16 

18 20 22 

24 

26 

28 


We augment the model outlined previously by keeping the distribution of h*set. T1 
results of the previous algorithms are still valid for these algorithms. 

There have been several algorithms aimed at finding solutions faster with controls 
reduction in solution quality. Pohl (1970) uses an evaluation function f e = (1 — s)g + s 
for 0 < s < 1. For values of s > 1 /2, we may expect solutions with less search effort 
certain cases. Pearl (1984) has outlined algorithm 7?| that works with limited risk usii 
information about the uncertainty of the heuristic function. The uncertainty informant 
of the h values at a node has been modelled as a probability distribution function at 
node. He has proposed several alternatives for selecting the node to expand given the vali 
of 5, and shown that these algorithms are 5-risk admissible. Bramanti-Gregor & Da\ 
(1993), propose a statistical method of combination of features. The method provides 
probabilistic estimate of the upper-bound to the solution error by calculating the certain 
bounds of the supremums of the standard errors of the predicted values. Bramanti-Greg 
& Davis (1992) evaluate the performance of this method using some methods aimed 
producing near-optimal solutions with reduced node expansions. 

We model similar concepts as Pearl (1984); but since we are working with sets, tl 
possibilistic model (Klir & Folger 1988) seems more appropriate in this situation, 
the standard probabilistic search algorithms in literature, a necessary condition for t 
algorithms to work well is that the h*set has elements that are fairly concentrated arounc 
central function (Chenoweth & Davis 1991). In our model, that is not a necessary criterii 
— a discriminatory set of features that can effectively discriminate between off-track aj 
on-track nodes by applying the concept of pathint will work well. In the last section, \ 
had modelled the h*set corresponding to each value of the feature vector, as well as t 
fset and the Pset as ‘crisp sets’. If all members of the h*set are not equally likely, \ 
may use the information about the likelihood of the individual members of the h*set 
make a more informed decision provided that we relax the optimality requirement. \ 
propose to assign ‘membership grades’ to each element of the h*set proportional to t 
frequency of their past occurrences. We take this as a measure of the possibility of th< 
future occurrence. Thus the h*set can be modelled as a fuzzy set. We modify the algoritf 
CS* to work with fuzzy sets, min is taken as the intersection operation, and max as t 
union operation. Selection of a node for expansion is by taking the least of the f& value 



Learning for efficient search 


311 


the Pset for each set in OPEN, where S is our confidence parameter. This means that we 
initially ignore those values of likely f*s whose possibility of occurring is less than S. 
Before outlining the algorithm, we give a few definitions. 

DEFINITION 10 

The distribution of h*set at a node (denoted by Dh*set{n)) is the normalized 4 possibility 
distribution of feasible h* values corresponding to the feature vector value at the node 
( hvec(n )). 

Ch*(x, n) is the distribution of the exact/relative number of times that the value of h* 
was x for nodes with feature vector value hvec(n). 

ph* (x, n) is the Possibility Density Function corresponding to DH*set (n) that measures 
the relative possibility that the value of h* is x corresponding to the feature vector value 
hvec(n). 

DEFINITION 11 

Dfset(n, P ): The distribution of fset at a node (n) along a path P s ~ n from start to node 
n is the normalized possibility distribution of feasible /* values corresponding to the 
feature vector value associated with the node ( hvec(n )) and the g value corresponding to 
the current path at the node. 

C/(x, n) is the distribution of the count of the f-value of a node induced by Ch* (x, n), 
Pf (x, n) is a density function corresponding to Dfset. It measures the relative possibility 
that the value of the optimum f value through the node is x. 

Pfiy, n) = Ph*(y - gin), n). 

DEFINITION 12 

DPset(n, P) is the possibility distribution function of the value of optimum solution that 
is an extension of path P. 

DPset(s, (s)) = Dfset(s) 

DPset(n, P | m, n)=DPset(m, P | m) D Dfset (n) 
min is taken to be the intersection operation. 

PPsetix, P) is the possibility that the value of the optimum solution through the 
path is x. 


PPsetix, P) = p| pf(x, n) 
neP 

= min neP {pf(x, n)}. 

DEFINITION 13 

The fs value of a fuzzy set is defined to be the minimum element of the a-cut of the given 


4 A possibility distribution is said to be normalized if the highest membership grade value is 1. To normalize an 
arbitrary possibility distribution, we find the peak(s) of the possibility density function, and then scale the distribution 
such that the maximum peak becomes 1. 
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set with a = 8. Let A be a fuzzy set, A a be the a-cut of A. 

n 

i =1 

A a ={x e X I /> a (x) > a}, 

/$ = {x | x € A a , x < y Vy e A a }. 

5.1 Learning phase 

A set of random problems are generated according to the given distribution, and corre¬ 
sponding to the optimal solutions to these problems, the Ch*(x, h) values are updated for 
all sets of feature vectors h. We use Ch* (x, h) values in the problem solving phase. 


Computation ofDh*set( h) values: If R is the set of all feasible values of h*set( h), and 
| R | = K , in the absence of any examples, we let the possibility at each point in the feasible 
range be 1/K. After encountering N examples for nodes whose feature vector value is h, 
we update the value of C/,*(x, h) as follows: Ch*(x, h) = n x where n x is the number of 
times that the value of h* was x corresponding to a state having feature vector value = h. 
Then we compute the distribution as follows following Hansson & Mayer (1989). 


Dh*set(x , h) 


n x + 1 

X + l 


where X = max x {n x }. 


5.2 Problem solving phase 


If pi is the possibility of having an optimum solution cost of C through node n \, and p 2 is 
the possibility of having an optimum solution cost of C through node child of n\, then 
the possibility of having an optimum solution cost of C including the arc («i, «2) is given 
by min(pi, p 2 )- If D\ and £>2 be the distributions of the fsets of the parent and the child 
respectively, then the distribution of the possible optimum solution costs through the arc 
is given by D\ fl ZL where H uses the min operator to compose individual elements of the 
distribution. If P is the current path terminated at node n, then the possibility distribution 
of optimum solution cost values given the path information can be obtained by 

Pi Dfset(m , PPs-m), 
meP 

which is obtained by the possibility density function given by 
pp(C, n) =min meP {pf(C,m)}. 

To guarantee optimum solution we have to select for expansion that node from OPEN 
s.t. the minimum non-zero /-value of the pathint of the node is minimum among all the 
nodes in OPEN. If we are prepared to trade off the optimality requirement partially in 
the hope of quickly reaching a solution, we may ignore those members of the distribution 
whose possibility values lie below a certain threshold. This motivates us to choose a node 
for expansion that has the minimum ‘significant’ /-value. The threshold of significance 
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is controlled by the parameter 8. 8 is a measure of the risk associated with missing the 
optimum solution, and is called the confidence parameter. If no solution is found with 
parameter 8, the value of 8 is reduced and the algorithm rerun. This continues until a 
solution is obtained. 


Algorithm FSJ 

1. [INITIALIZE:] minub = oo OPEN (, s , Dfset(s , (s)), DPset(s , ( s )))) where 

Dfset(s, (, s )) = DPset(s, (s)) = Dh*set(s). CLOSED (j> , delta = 8 

LOOP: 

2. [TERMINATE:] If OPEN is empty, exit with failure. 

3. [SELECT:] For all nodes m € OPEN, Select the node for expansion which has the 

minimum value for /<$(m) out of all nodes in OPEN. If no such node exists, set delta = 
delta 12 and repeat step 3. 

Call the selected node n. 

4. [CHECK FOR GOAL:] If n is a goal node, exit with solution. 

5. [EXPAND and PRUNE:] Expand node n, generating the set M of its successors. 

For all m e M, 

If mi OPEN or CLOSED 

add the tuple (m, Dfset(m , P), DPset(m , P)) to OPEN if Pset(m) ^ 0. 
If me OPEN or CLOSED 

direct its pointers along the path of lowest value of g(m). 

If m required pointer adjustment and was on CLOSED, reopen it. 

If max{fset(m)} < minub, 
set minub = max{fset(m)} 

Remove from OPEN those nodes whose lower bound > minub 

6 . [CONTINUE:] Go to LOOP. 


Computation of the pathint value at a node : 
If ppsetiy, parent) = r\, 

Pf(y, child ) i= r 2) 

then ppsetiy, child) = min(r\, r 2 ). 


FS| selects that node for expansion which has minimum value for f$(n). 8 is defined to 
be our confidence parameter. It is a measure of the risk associated with missing a solution. 
Suppose our algorithm FS| has found a solution C with some node m in OPEN, s.t. 
fo(m) < C. What is the possibility that we would have found a solution with cost < C 
by expanding m? Our knowledge about the cost of optimum solutions containing the path 
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Pi,({s ... m)) is represented by DPset(m, PP s - m ). Therefore the given possibility can be 
computed as 

max{p\p = p(x, m) & x < C}. 

If C* < fs («), this possibility is < <5. In other words, the possibility of the current solution 
being an optimum solution is > (1 — 8). 


5.3 Properties ofFS* 

Observation 4. CS* is a special case of FS| with 5 equal to a small positive value. When 
the scheme of computation of Dh*set is as above, FS| with 8 = 0 is equivalent to A*(/z) 
where h is the underestimating heuristic used in assigning the values of Dh*set. 

Lemma 7. The algorithm is robust and can be always made to terminate with a solution 
even if the h*set is not complete. 

Proof. While assigning the possibility distribution we allow for sufficient noise tolerance 
and assign some possibility value, however small, to values of h* that have even the 
remotest chance of being included in the set. Under such condition, there will always 
exist some positive value of 8 for which the algorithm will find a solution. Since the 
algorithm tries with progressive reductions in the value of 5, termination with a solution 
is ensured. □ 


Table 6. Performance of algorithm FS* for different amounts of learn¬ 
ing and different values of 8 versus the performance of A*. 

NE: Average relative percentage of nodes expanded (compared to A* (h i). 
SO : Average relative percentage of suboptimum solutions found (com¬ 
pared to A*(h\). 


8 

FS *(h u h 2 ) 

FS *(h u h 3 ) 

FS*(hi,h 2 ,h 3 ) 


AveNE 

Ave SO 

AveNE 

Ave SO 

AveNE 

Ave SO 

0.1 

75.6 

0.3 

75.9 

0.1 

39.5 

0.2 

0.2 

74.3 

1.4 

75.8 

0.1 

37.5 

0.5 

0.3 

71.4 

3.0 

73.8 

1.0 

37.4 

0.6 

0.4 

69.3 

3.0 

70.1 

2.1 

35.3 

2.8 

0.5 

64.7 

5.1 

68.2 

3.9 

33.8 

4.1 

0.6 



58.0 

10.8 

32.7 

6.2 

0.7 

60.5 

8.0 

50.6 

13.5 

31.2 

8.9 

0.8 

53.5 

13.9 

48.2 

15.9 

30.0 

12.0 

0.9 

53.3 

15.6 

42.0 

18.6 

27.1 

17.1 

1.0 

53.1 

20.2 

39.1 

23.0 

25.8 

17.5 
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6. Conclusion 

This paper addresses the issue of learning in admissible search. The information require¬ 
ment for best first search has been studied. For this, we have distinguished the domain 
information as given by the feature values at the nodes from the heuristic determiner. The 
latter is learned from the former, and both can be of quite general form, the goal being to 
design a powerful determiner for a given domain, that can discriminate effectively between 
on-track and off-track nodes. The features can be arbitrary, and we have assumed a tabular 
form for representation of the heuristic determiner. Algebraic formulae for combining the 
features give constant space representations, but a particular algebraic form may not be 
suitable to combine effectively a given set of features for a domain. Furthermore, they 
cannot serve as a model for powerful algorithms like CS* . However they may lead to large 
space requirement. 
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Abstract. We present our machine learning system, that uses inductive logic 
programming techniques to learn how to identify transmembrane domains from 
amino acid sequences. Our system facilitates the use of operators such as ‘con¬ 
tains’, that act on entire sequences, rather than on individual elements of a 
sequence. The prediction accuracy of our new system is around 93%, and this 
compares favourably with earlier results. 

Keywords. Machine learning; inductive logic programming; transmembrane 
domains; amino-acid sequences; molecular biology. 


1. Introduction 

A machine learning system that uses background knowledge relevant to the application 
area, and a set of examples and counter-examples of a particular concept, to learn the 
description of that concept in the form of a set of Horn clauses or Prolog program is an 
inductive logic programming system (Muggleton 1991). 

We present in this paper, our implementation of a machine learning system, that uses 
inductive logic programming techniques to learn how to identify transmembrane domains 
from amino acid sequences. The problem of transmembrane identification is a very impor¬ 
tant protein classification problem. We present the work done by Shimozono et al (1993) 
in the identification of transmembrane domains, and then present the results obtained by 
using our machine learning system. Prediction accuracy by using our new system is found 
to be around 93%. These are very good results for any classification problem. These also 
compare very favourably with the earlier results that were of the same accuracy. 

Our machine learning system extends earlier inductive logic programming techniques 
by facilitating the use of operators that act on entire sequences, rather than individual 
elements of a sequence. In this particular application, we use operators such as ‘contains’ 
and ‘abstains’ that are true when a test string is contained or not contained in a target string. 
However, these are not the only operators that can be used, and other operators that are 
useful in the biological context can be studied. 
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This paper describes the use of our new inductive logic programming technique 
the identification of transmembrane domains. However, this technique is a general, a 
extremely powerful tool that can be used to learn many biological and other concepts 
sequence classification contexts. 


2. Machine learning 

Machine learning is the study and computer modelling of learning processes (Michal: 
et al 1983). Learning processes include the acquisition of new declarative knowledge, t 
development of motor and cognitive skills through instruction or practice, the organizati 
of new knowledge into general, effective representations, and the discovery of new fa 
and theories through observation and experimentation. 

Machine learning systems can be classified along many different dimensions. One 
the dimensions by which machine learning systems can be classified is on the basis of t 
underlying learning strategy used. There are many different learning strategies that c 
be used. The different strategies can be ordered by the amount of inference the leami 
system performs on the available information. Some strategies involve very little inferen 
whereas others entail a lot of inference. 

In ‘Rote Learning’ or direct implanting of new knowledge, there is actually no inferer 
or any transformation of the knowledge by the learner. The new knowledge is directly gi\ 
to the learner and the learner uses this knowledge. ‘Learning from Instruction’ (or learni 
by being told) is the acquiring of knowledge from a teacher or some other organri 
source such as a textbook. The learner has to transform the knowledge from the in] 
source to some internal representation. This has to be done in such a way that the n 
knowledge integrates well into any existing knowledge. In ‘Learning by analogy’, ; 
learner uses existing knowledge that bears strong similarity to the desired new conce 
and transforms or augments this knowledge to learn the new concept. In ‘Learning fr< 
examples’, given a set of examples and counter examples of a concept, the learner indu< 
a general concept description that describes all of the positive examples and none 
the negative examples. ‘Learning from observation’ includes discovery systems, theo 
formation tasks, the creation of classification criteria to form taxonomic hierarchies, £ 
other similar tasks that are to be done without the benefit of an external teacher. The lean 
is not provided with a set of instances of a particular concept, nor is it given access .to so 
oracle that can classify any internally generated instances as positive or negative instam 
of the given concept. 

The underlying learning strategy is only one of the dimensions along which mach 
learning systems can be classified. Machine learning systems can also be classified 
cording to the type of knowledge acquired as well as according to the domain of 
application. 

Michalski (1983) describes inductive learning as that process of acquiring knowlec 
that is done by drawing inductive inferences from facts provided, by either a teacher 
the environment. Conceptual inductive learning is then described as a type of induct 
learning whose final products are symbolic descriptions that are expressed in high-le 
human-oriented terms and forms. Concept learning from examples (which is also cal 
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concept acquisition) is the task of inducing general descriptions of concepts from some 
specific instances of these concepts. These instances of the concepts are preclassified by 
a teacher into one or more classes (concepts). The induced description or hypothesis for 
a given concept is such that if an object satisfies this description, then it represents the 
concept. 

Concept acquisition can learn a single concept or can learn a collection of concepts. 
In single concept learning, one can leam from positive examples alone, or from positive 
and negative examples (examples and counter-examples). When learning is from positive 
examples alone, since there are no counter-examples, there is no natural limit to which 
the description can be generalized. It is thus possible that the learnt description may be 
over-generalized. When counter-examples are also provided, there is an obvious limit on 
the extent to which the hypothesis can be generalized. The most useful counter-examples 
are the ‘near-misses’ that only slightly differ from positive examples. These clearly limit 
the extent to which the learnt hypothesis can be generalized. 

Background knowledge defines the assumptions and constraints imposed on the ob¬ 
served facts and generated inductive assertions, and any relevant problem domain knowl¬ 
edge. In addition to describing the concept, the learnt hypothesis has to be consistent with 
the background knowledge. 

3. Inductive logic programming 

Muggleton (1991) describes inductive logic programming as the research area formed at 
the intersection of logic programming and machine learning, with work being done on 
problems of inductive reasoning within the confines of pure Prolog. Muggleton & Feng 
(1990) present the inductive logic programming learning algorithm GOLEM. Given a set 
of positive and negative examples, and background knowledge in the form of Horn clauses, 
GOLEM constructs a set of hypothesized Horn clauses which together cover all (or most) 
positive examples and no (or negligible) negative examples. GOLEM is implemented in 
‘C’. All the data is explicitly given as background knowledge. 

Quinlan (1990, 1991) presents FOIL (first-order inductive learner) whose input is in¬ 
formation about one or more relations. One of these relations is the target relation, and 
is to be defined by a Horn clause program. A set of tuples of constants that are in the 
target relation is given. A set of tuples that are not in the target relation may be given, or 
the closed world assumption used to generate such tuples. FOIL’S output is a Horn clause 
program that defines the target relation. 

FOIL and GOLEM belong to a group of learning algorithms that are described as cov¬ 
ering methods. They construct a classification rule in the form of a disjunctive expression 
(a set of Horn clauses; or a Horn clause logic program). The covering method works as 
follows: 

- A conjunction of conditions is found, such that it is satisfied by some examples in the 
target class, but no (or very few) examples outside the target class. 

- This conjunction is appended as a disjunct of the classification rule. 

- All examples that satisfy this conjunction are removed, and if there are still examples 
of the target class remaining to be covered, the procedure is repeated. 
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GOLEM determines each conjunction of conditions by taking the best conjunction (that 
covers the most positive examples, while covering less than a specified number of negative 
examples) of those formed from a random sample of pairs of examples, by choosing those 
background predicates that are true for both examples in the pair. The coverage of this best 
conjunction is further improved by using other examples in a random set of examples. 

To sum up, inductive logic programming is a concept acquisition area in Machine 
Learning and the learning strategy used is that of ‘Learning from examples’. Positive 
as well as negative examples are normally used. Background knowledge is also used. The 
knowledge acquired is in the form of a set of Horn clauses or a Prolog program. 


4. Biological applications of inductive logic programming 

The ILP algorithm GOLEM has been applied by Muggleton et al (1992) to the prediction 
of secondary structure from protein primary structure. This application predicts whether 
a particular residue is in an alpha-helix secondary structure or not. Positive examples are 
residue positions in the protein sequences that are in an alpha-helix secondary structure. 
Negative examples are those that are not. The background knowledge contains a lot of 
information about the protein structure, largely primary structure information. GOLEM 
is used to generate a Logic Program, that can predict whether a particular residue from 
a test protein sequence (that may not have been one of the training set used to learn this 
program), is in an alpha-helix secondary structure or not. Prediction accuracies of over 
80% were achieved. 

GOLEM is also used by King et al (1992) to model the quantitative structure-activity 
relationship (QSAR) in a related series of ligands. This is useful for drug design. Ranking 
of activities of drugs is done by considering pairs of drugs and comparing their activities. 
Thus the positive examples are paired examples of greater activity (where the first element 
of the pair is more active than the second element). Negative examples are paired examples 
of lower activity. Background facts are the chemical structure of the drugs and the properties 
of the substituents. GOLEM derived nine rules that predict the relative activities of two 
drugs. 

GOLEM has not been applied to entire sequences. In the alpha-helix prediction, each 
element in the sequence is considered individually; each residue in the protein sequence is 
considered as an individual example or test case. However, many biological applications 
involve sequences and it becomes difficult to handle all the sequence applications by 
treating the elements of the sequence individually. There is a need for applying inductive 
logic programming techniques to entire sequences. 

When the general description of a particular concept is learnt, there may be exceptions 
to the general description. Different sets of exceptions may each be specific to a particular 
general rule in the many that together, as a disjunct, describe the concept. These are local 
exceptions (Siromoney & Siromoney 1993). Local exceptions can be considered as a 
special case of the more general idea of variations (Siromoney & Siromoney 1995). A 
particular concept may be general in nature and of interest to many researchers. A lot of 
effort is spent in learning this concept. Another concept to be learnt may be similar to this, 
a variation of the other concept, and can be learnt with much less effort by making use of 
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the rules learnt to describe the earlier concept. The role of such variations is illustrated with 
a biological example. Identification of signal peptide sequences is learnt using mammalian 
data (of mammals other than rodents or primates). This knowledge is then used to easily 
learn the identification of signal peptides for primate data. It is also found in this illustrative 
experiment, that the prediction results were better than those obtained by learning from 
the primate data directly. 


5. Identification of transmembrane domains 

Arikawa et al (1992) and Shimozono et al (1993) describe a new approach to the identifi¬ 
cation of transmembrane domains from amino acid sequences. This problem of transmem¬ 
brane ident ifi cation is a very important protein classification problem. The PIR database 
contains amino acid sequences, with the FEATURE field for each sequence indicating 
where the transmembrane domains are located. The amino acid sequences are cut into 
substrings in such a manner that positive example strings contain substrings entirely within 
transmembrane domains, and negative example strings contain substrings that were com¬ 
pletely outside the transmembrane domains. A decision tree is learnt that can classify any 
new substring as a transmembrane domain. 

The simple form ‘xAy’ of a regular pattern language is used in the nodes of a decision 
tree, ‘x’ and ‘y’ are variable substrings and ‘A’ is a given fixed substring. Thus this simple 
form determines whether a given substring is ‘contained’ in the target substring. The 
‘contains’ operator has been studied in detail by Sakakibara & Siromoney (1992). The 
‘contains’ operator is true when the search string is contained in the target string. The ‘Y’ 
path of the node of the decision tree is taken when the target string is of the form ‘xAy’, 
that is ‘A’ is contained in the target string; and the ‘N’ path taken otherwise. The decision 
tree is learnt using a modification of the ID3 algorithm presented by Quinlan (1986). 

When the amino acid sequences were used directly as the strings and the pattern, the 
performance was correct classification of 84.8% of the positive test cases and 89.6% of 
the negative test cases. The hydropathy index of an amino acid was used to distinguish the 
amino acids into three distinct categories. The twenty symbol amino acid sequences were 
transformed into three symbol sequences by assigning each amino acid symbol to one of 
these three distinct categories. The performance was then correct classification of 91.4% 
of the positive cases and 94.8% of the negative cases. A new method of indexing was 
used to transform again the twenty amino acid symbols to three symbols, and the perfor¬ 
mance was correct classification of 93.3% of the positive cases and 92.4% of the negative 
cases. 

6 . System description 

Our new machine learning system uses the GOLEM algorithm. The major thrust of our 
new system is that the data need not be given explicitly as background knowledge. The 
background knowledge is in the form of C callable functions that operate on data corre¬ 
sponding to a particular example, and return a true or false value. This makes it much 
easier to specify the background knowledge, since the raw data is associated with each 
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example or test, and the background knowledge, in the form of callable functions, operates 
on this raw data and returns a true or false value. This also facilitates the use of operators 
such as ‘contains’. The ‘contains’ operator returns true when the given search string is 
contained (is a substring) in the target string, and returns false otherwise. We also use the 
‘abstains’ operator that is the opposite of ‘contains’ and is true when the search string is 
NOT contained in the target string. 

Our machine learning system is written in ‘C’. The inputs to our system are background 
knowledge, and positive and negative examples. The background knowledge is in the form 
of a C table, where each element specifies the address of the C function to be called 
(which will operate on the particular example and return a true or false value), and two 
parameters (currently!). The first parameter is a variable parameter, where the count and 
list of parameters is specified. The second parameter is a constant parameter. The use 
of the variable first parameter is equivalent to specifying the same function many times, 
each time with the corresponding parameter from the list of parameters. To illustrate: 
instead of specifying contains(‘++’), and contains(‘—’) as separate entries in the table 
of background knowledge, it is possible to specify them as a single entry that has the 
variable first parameter with a count of 2, and the list of parameters with “++” and “—” 
as the two entries. The second constant parameter in this case is empty. This simplifies entry 
of background knowledge. The system can be easily extended to cater to more number of 
the constant parameters. 

Each of the C functions used in this table also needs to be written and compiled along 
with this table. The parameters to any of these C functions include those that specify the 
particulars of the current example (or test case), and the two parameters as given in the 
table of background knowledge. 

The examples are given as two text files, one containing all the positive examples and the 
other containing all the negative examples. Each example is the corresponding character 
sequence as a separate line in the text file. 

The system uses the background knowledge by using each entry in the table. This 
involves repeatedly invoking fhe C function through its function address, and passing as 
parameters, the particulars of the current example (or test case), the current element in 
the list of first parameters and the constant second parameter. The function operates on 
the particular example, using the two parameters, and returns a true or false value. This 
is the equivalent of specifying the background knowledge through many ground Prolog 
clauses. 

Our comprehensive system includes the machine learning system and also a test bed 
for using the leamt knowledge on a set of classified test cases. Background knowledge, 
positive and negative examples and also positive and negative test cases are given as input to 
our comprehensive system that first learns from the background knowledge and examples, 
and then predicts the test cases and measures the accuracy of prediction. 

Our system currently uses the compiled table of C function addresses and the associated 
C functions as background knowledge rather than Prolog program clauses. It also currently 
internally evaluates the induced Horn clauses in their internal format rather than externally 
use a Prolog compiler on the Prolog output generated. Appropriate input/output translation 
modules can be added to use Prolog-like input and generate actual Prolog program output. 
The Prolog compiler will need a C function interface. 
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7. Method 

Positive and negative examples of the transmembrane data from PIR were kindly supplied 
by Prof Miyano. These were sorted, only unique strings selected, and then randomized. 
There were 623 unique positive examples and 19164 unique negative examples. Three 
tests were conducted, each with 200 positive and 200 negative examples in the training 
set, and the remaining in the test set. The first, second and third 200 examples were taken 
for the three tests. 

The background knowledge used modified versions of the contains and abstains op¬ 
erators which internally use one of two translation mechanisms on the raw amino-acid 
sequence of each example. Both the translation mechanisms converted the amino acid 
strings into three symbol strings. One was based on the hydropathy index of the amino 
acid, where the three categories of amino acids were based on the Kyte and Doolittle 
hydropathy index. The amino acids fall into three clear categories, with one category be¬ 
ing amino acids with a positive hydropathy index, the second, those with very negative 
hydropathy index, and the third, those with hydropathy index near zero (0.0 to —1.6). 
This is the same translation mechanism used initially by Arikawa et al (1992). The second 
translation mechanism was based on whether the amino acid is acidic, basic or neutral. The 
variable first parameter was used to specify all the possible two and three character long 
sequences possible from the three symbols as the test strings for these operators (“+•+”, 
and so on). 


8. Results 

The results of correct classification of the test set are given below. 

Correct +ve Correct —ve 

Test I (Training: 1st 200) 92.7% (392) 93.5% (17725) 

Test II (Training: 2nd 200) 93.1% (394) 93.1% (17648) 

Test III (Training: 3rd 200) 93.6% (396) 93.2% (17671) 


We present below one of the clauses that was learnt in the first test, as an example of 
the learnt rules: 

contains('**') , contains('+*'), contains('+**'), 
abstains('+-+'), abstains, abstains, 
contains('NN'), contains('NNN'), 

abstains('AB'), abstains('AA'), abstains('BBB'), 
abstains('BBN'), abstains('BNA'), abstains('NBA'), 
abstains('ANB'), abstains('ANA'), 
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* is an amino acid with positive hydropathy index 
+ is an amino acid with 0 to —1.6 hydropathy index 
— is an amino acid with a more negative hydropathy index 

A is an acidic amino acid 
B is a basic amino acid 
N is a neutral amino acid 

Redundant clauses such as 

‘abstains(‘AAA’)’ in ‘abstains(‘AA’), abstains(‘AAA’)’ 

were removed by hand to improve the readability of the learnt clause. 

The results clearly indicate that our new machine learning system can be used tc 
mine which amino-acid sequences fall within the transmembrane domain. The rale; 
by the system are however quite complex, as seen in the example given above, and: 
further analysis by human experts to see if any new biological results can be derive 
them. 

9. Conclusion 

The experimental results obtained by our new machine learning system compai 
favorably with the results obtained by using a regular pattern language in the n( 
a decision tree. Both systems deliver results in the lower half of the nineties. Th 
extremely good prediction results for any classification or identification problem. 

Transmembrane identification is just one of the numerous biological and other o 
to which sequence based inductive logic programming can be applied. Inductiv 
programming in general, and even more so our specific implementation where back 
knowledge is in the form of callable C functions as predicates, is a very powerful t< 
can be extensively applied to the many sequence related biological classification prc 
Many different operators, other than ‘contains’ and ‘abstains’ can be studied, as 
different translation mechanisms that are biologically relevant. 


This work was carried out with the support of a research grant from ISIS, Fujitsu I 
tories. 
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Abstract. Management of large projects, especially the ones in which a ma¬ 
jor component of R&D is involved and those requiring knowledge from diverse 
specialised and sophisticated fields, may be classified as semi-structured prob¬ 
lems. In these problems, there is some knowledge about the nature of the work 
involved, but there are also uncertainties associated with emerging technologies. 
In order to draw up a plan and schedule of activities of such a large and complex 
project, the project manager is faced with a host of complex decisions that he 
has to take, such as, when to start an activity, for how long the activity is likely 
to continue, etc. An Intelligent Decision Support System (IDSS) which aids the 
manager in decision making and drawing up a feasible schedule of activities 
while taking into consideration the constraints of resources and time, will have 
a considerable impact on the efficient management of the project. This report 
discusses the design of an IDSS that helps in project planning phase through 
the scheduling phase. The IDSS uses a new project scheduling tool, the Project 
Influence Graph (PIG). 

Keywords. Intelligent decision support system; project management; semi- 
structured problems. 


1. Introduction 

Management of typical large projects requires the manager to be adept in working with re¬ 
source constraints and complex decision-making apart from meeting tight time schedules. 
And, if the project involves the development of a large, new, sophisticated system, such 
as a Surveillance Radar or a Satellite Launch Vehicle, the manager has to deal with the 
additional dimensions of ever-changing technologies, knowledge requirements in diverse 
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specialised areas of work, and geographical spread in the execution of various sub-activities 
of the project. To these, the problems of escalating project costs, unforeseen delays in cer¬ 
tain activities, sudden non-availability of resources, and sometimes even changes in the 
specification of the end system, requires the project manager to constantly monitor the 
project progress, re-plan or re-schedule certain activities, or choose an alternate course of 
action. 

Project scheduling has, till recently, followed the operations research (OR) approach of 
viewing a project as a set of activities alone, with each activity having a set of attributes such 
as duration, precedence relations etc., and, was represented as a network. The objective of 
project scheduling was to minimise the overall project duration using CPM/PERT methods. 
However, these techniques make an assumption of unlimited resources, which is generally 
not true in the real world domain. These techniques also have other inadequacies such as 
lack of recognition of rework cycle etc. (Cooper 1994). To tackle the resource constrained 
project scheduling problem, mathematical linear programming methods or heuristic rule 
methods are used, but with each having certain limitations. 

Decision support systems (DSS) and artificial intelligence techniques provide a better 
toolkit to the project manager in dealing with semi-structured scheduling problems, such as 
those faced in R&D projects. In this paper, we use a new planning and scheduling tool, the 
Project Influence Graph (PIG), and AI search techniques, to develop an IDSS for project 
management. 


2. The project scheduling problem 

A project may be defined as a collection of interrelated activities, each activity requiring 
various types of resources. In addition, a project possesses the following characteristics - 

• a project should have a specific goal or a set of objectives to be accomplished in a finite 
time frame 

• a project is homogeneous, in the sense that all activities that comprise the project 
are essential for the completion of the project, and no other outside activity has any 
influence on the completion of the project 

• a large project typically, is of a complex nature involving a mixture a series and parallel 
activities requiring an inter-play of efforts, resources and time 

• a project is usually of non-repetitive nature. 

Every activity of the project is described by a set of attributes, though it is not necessary 
that all the attributes of all the activities need be known at the start of the project itself. 
The duration of the activity, resource requirements, the earliest date at which the activity 
can start, the latest date before which the activity must be definitely finished, precedence 
relationships with other activities etc., form some of the important attributes. 

The project scheduling problem is to find a sequence of activities, i.e., assign a start date 
for each activity, such that some objective such as minimising the overall project duration, 
is achieved. PERT and CPM techniques result in such a sequence when the resource and 
timing constraints are ignored. The resulting sequence is a precedence-feasible schedule, 
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as only the precedence constraints are satisfied. Linear programming techniques and enu- 
merative methods solve the problem with more realistic resource constraints being taken 
into consideration. However, the computational work load involved in these methods make 
them infeasible for solving even moderately large scheduling problems. 

Noronha & Sarma (1989) describe the application of AI search techniques for the 
resource-constrained Project scheduling problem. They propose the use of informed search 
techniques such as A* algorithm to come up with a feasible schedule, which has been em¬ 
ployed in the development of this IDSS. 


3. Project influence graphs (PIG) 

The activity precedence network or PERT network has for long been used as a planning tool 
for project management. But, this graphical tool does not explicitly depict the decisions 
that were involved when the work of activities was being drawn up. During the monitoring 
and analysis stage of project management, it is often beneficial to know why a particular 
decision was taken, what were the factors that influenced the decision during the planning 
stage etc. Another drawback of PERT network is that it does not allow the manager to 
choose from alternate courses of action available to him, explicitly. A separate PERT 
network needs to be drawn up for each option of each activity. For example, a radar 
signal processor may be developed using DSP processors, using array processors, using 
microprogrammable slice processors, or by designing ASICs. Each course of action needs 
to be carefully evaluated with respect to duration, resource requirements, and impact on 
other activities, and the manager has to choose the most feasible option. Finally, a PERT 
network sometimes poses the problem of “Informational Overload” to the project manager. 
In large projects, involving thousands of small activities, a top level project manager need 
not be aware of all the details of each and eveiy small activity. It is quite essential that the 
project be viewed at different levels of detail by different levels of project managers. In 
short, the requirement here is a hierarchical abstraction of activities and information. 

Project Influence Graph (Noronha 1993) addresses these problems. A PIG effectively 
combines the features of AND/OR trees, activity precedence networks, influence diagrams 
and decision trees. 

A PIG (figures 1 and 2) consists of a hierarchy of levels L\ , L 2 ,..., L*, with each level 
providing a different view of the project. Higher the level (level L, is higher than level 
Lj if i < j) more is the abstraction of the project details. Each project description level 
consists of a set of statement nodes, decision nodes, object nodes and values nodes, in¬ 
terconnected by different types of arcs and links, such as dependency arcs, informational 
arcs, precedence arcs and state change arcs. The sub-graph formed by the set of activ¬ 
ity (object) nodes along with precedence arcs is a directed acyclic graph (DAG). The 
sub-graph formed by the set of statement nodes, decision nodes, dependency arcs and 
informational arcs is also a DAG, in accordance to the definition of a “regular” influence 
diagram. 

Statement nodes correspond to the chance nodes of influence diagrams. As in influence 
diagrams, statement nodes are depicted pictorially by circles or ellipses, and decision 
nodes by rectangles. As stated earlier the value nodes are represented by diamonds in a 
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Figure 1. (a) ‘Growing’ a PIG for a radar development project showing the different 
available alternative approaches, (b) An alternative viewpoint of layer 1. 


PIG, while the object nodes are represented as rectangles with rounded comers. While it 
influence diagrams, both the dependency arcs and informational arcs are represented a; 
solid boldface arrows, in a PIG, a distinction is made between the two, with dependency 
arcs being represented by solid boldface arrows and informational arcs being represente< 
by dashed arrows. A dotted arrow depicts a time precedence arc. 

The statement nodes help in representing the various facts or pieces of knowledge tha 
need to be considered while reaching a decision, or, that which influences an activity 
etc. Each statement node 5, has a random variable Xi attached to it, which has either ; 
discrete or a continuous probability distribution. In the case of computer representation am 
evaluation, the random variables are usually discrete in nature, with each having a finit 
set of outcomes. Alternatively, each statement node may have a fuzzy variable (Zadel 
1965) attached to it, thus forming a cognitive map instead of an influence diagram, am 
can be subjected to qualitative evaluation (Zhang et al 1989). The knowledge about th 
uncertainty associated with a statement is propagated across the dependency arcs linkin 
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Figure 2. Decomposition of an activity into sub-activities and influence diagram 
for activity duration estimation. 


the statement nodes using either the Bayesian probability analysis or fuzzy set calculus, 
depending on the type of variable that is attached to the nodes. 

PIG distinguishes two important types of statement nodes - the observable chance 
nodes and the deterministic nodes. The observable chance nodes are those whose outcome 
is not deterministic at the time of planning, but its final outcome is observable at a later 
point in time and thus the subsequent decisions will have to be made based on this being 
observed. Observable chance nodes are drawn as meters and are always direct informational 
predecessors of decision nodes. The deterministic nodes are those whose outcome can be 
determined given the outcomes of its conditional predecessors. These are represented 
as triangles. However, in the present implementation, the statement nodes are not sub¬ 
classified. 

The object nodes represent activities and resources. Activity nodes are linked to state¬ 
ment nodes via structural links. For example, the statement node ‘duration of activity’ 
is linked to the activity node ‘activity’ through a structural link. The lower level repre¬ 
sentation of these activity nodes (and of all nodes, in general) being frames (Fox 1985), 
statement nodes attached to object nodes are represented as frame slots. 

The activities of different hierarchy level form the activity-subactivity structure. The set 
of activities in level Lj that are linked to an activity A,-* in level L,- (i = j — 1), forms either 
the set of sub-activities of the activity Aik or the set of alternate courses of action available to 
accomplish the activity Aik- Thus, the links between activities across hierarchy levels forms 
an AND/OR tree structure. If an activity decomposition is an AND decomposition, the set of 
activities in the lower level to which it is linked forms the set of sub-activities, while an OR 
decomposition leads to a set alternatives. It may be noted here that activity decomposition 
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links may connect activities only between two adjacent hierarchy levels. This hierarchical 
structure is similar to the goal-subgoal decomposition used in AI techniques. This structure 
also permi ts information abstraction, providing simpler views of the project at higher levels, 
and in-detail views at the lower levels. 

Hierarchy levels are represented as shaded planes, to give the impression of layers 
stacked along the vertical axis. Thus, activity decomposition takes place along this axis. 

The advantages offered by PIG when compared to the other graphical representation 
tools for the project planning are elucidated by Noronha (1993). 

3.1 An example 

Let us consider the example of a project for the development of a new surveillance radar 
system. The main objective of such a project may be to acquire the state of the art technology 
in the field of radars and improve upon the present radar systems. The decision to acquire 
such a technology on determining the necessity to embark on such a project may itself be 
a result of a number of factors, such as obsolescence of the present equipment, security 
scenario of the country, the relationship of the country with its neighbours, economic 
feasibility, availability and access to such a technology from elsewhere etc. The concepts 
of decision theory, cognitive mapping and influence diagrams may be used for arriving at 
this decision itself. However, we shall assume that such a decision has been made, and 
such a project has to be undertaken. 

The main aim of the project being acquiring of the new technology, this forms the value 
node of level 1 of the PIG (figure la). This aim or objective is achieved by the successful 
development and fabrication of a new surveillance radar system, which forms the main 
activity of the project. Another viewpoint could be that level 1 has the value node of 
providing national security, which is determined by the decision of acquiring the state of 
the art technology. The various factors that influence this decision may also be depicted 
in layer 1. The outcome of this decision may be to develop a new radar system. Thus this 
activity replaces the decision node in layer 1. This is shown in figure lb. Once the influence 
diagram shown in figure lb has been evaluated and the decision for the development of a 
new radar system has been taken, then figure la replaces figure lb and figure la is now 
the starting point for the ‘growth’ of a PIG. 

Three different approaches may be candidates for consideration for building the new 
radar system-coherent radar approach, coherent-on-receive approach or the non-coherent 
radar approach. The decision to follow on the above approaches may depend upon a few 
technical factors, factors such as expected clutter power, and, basic requirements of the 
radar such as the required detection probability and false alarm probability, etc. Thus the 
fundamental activity of development of a new radar system of level 1, requires a decision to 
be made regarding the approach to be considered, depicted by the decision node of level 2. 
The various factors that influence this decision, are also shown linked to this decision node. 
The three alternatives of the decision node are shown as activity nodes in level 2, each 
linked to the decision node. The decision node is an OR node and the activity nodes are 
its children. 

Assuming that the decision of following the coherent radar approach is finalised, further 
development of the PIG is shown in figure 2. However, while growing a PIG, the decision 
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Figure 3. Decomposition of an activity ‘design’ into sub-activities. 


making is deferred till the PIG has been fully constructed. Thus, during the ‘growing’ 
phase, all the three options of the decision node are further expanded or decomposed. 
Only one option is considered here as representational example. The ‘development of a 
coherent radar system’ may be considered to consist of a number of sub-activities such 
as technology survey, feasibility study, design, development and fabrication, laboratory 
validation and final field trials. These sub-activities form a part of the hierarchical level 3, 
and linked to the parent activity in level 2, which is an AND node. This is because the 
activity of development of the coherent radar system can be considered complete only 
when each of the sub-activities has been successfully completed. 

The activity ‘fabrication’ of level 3, in figure 2, is linked to the statement node ‘duration’, 
which is in turn shown to be linked to other statement nodes via dependency arcs. This 
represents the fact that the duration estimation of the activity ‘fabrication’ is influenced by 
these factors. The activity of design can only be embarked upon after the technology survey 
and feasibility study have been completed. This precedence constraint is represented by 
the dotted time precedence arrows between the nodes. 

Figure 3 shows the decomposition of the activity ‘design’ into its component sub¬ 
activities, in hierarchy level 4. 


4. An IDSS using PIG 

The ultimate goal of project scheduling is to decide upon a sequence of activities together 
with start and end dates for each activity, that is feasible within the constraints of resources. 
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precedence and time. While attempting to draw up such a plan, the project manage 
confronted with two major decisions that he has to make for each activity, taking i 
consideration the constraints mentioned earlier - 

• when to schedule an activity 


• once an activity has been scheduled, what is the probable duration for which the actb 

lasts. 

Another important type of decision that a project manager is faced in this phase 
project execution, is the choice of course of action to be followed for an activity, amor 
different alternatives. Since these decisions have to be made for each and every activ 
for scheduling a large complex project consisting of numerous and varied activities wii 
large number of factors influencing these decisions, an intelligent decision support sysl 
will be of help for the project manager. 

The IDSS has been developed in an Object Oriented Programming (1989) langut 
C++. The different nodes of the PIG are treated as objects, with interactions betw 
them. At a higher level, the different nodes of the PIG are represented as frames (] 
1985), with frame slots containing the various parameters that describe a node. Thus 
entire PIG is represented as a frame network. However, the deterministic nodes and 
observable chance nodes have not been separately represented, since they are basic 
chance nodes, with special properties. 

The IDSS uses the PIG in facilitating the project manager to organise the informal 
available regarding the project, with the hierarchical structure of the PIG helping 1 
in“goal decomposition”. Once the. PIG is “grown” by interacting with the project mana 
the IDSS formulates the scheduling problem as a state space search problem and uses 
A* algorithm (Nilsson 1981) to arrive at the goal state, which is nothing but the stat 
which all the activities have been scheduled. To estimate the duration of an activity if it 
not been specified by the manager, the IDSS forms an influence diagram (Miller etal li 
Howard & Matheson 1981) of all the factors and decisions that influence the duratioi 
the activity, and evaluates the same. The same methodology is employed for helping 
project manager in choosing between the various alternative courses of action availabl 
him, to pursue an activity. 

The IDSS uses a combination of menu-driven and dialogue approaches to interact \ 
the user. But instead, a graphical package that includes all the graph edit features will ir 
the IDSS more user-friendly. The user starts a session by inputting the main activity n 
into the system. Going back to the example of the development of a radar system, the i 
inputs an activity node by that name into the hierarchy level 1. The system then switc 
over to the conversational mode, requesting the user for the various attributes of this n< 
Depending on the attribute values input by the user, the system proceeds to build the 
with the help of the user. A session can also be commenced by loading a previously st< 
PIG from the database file. 

IDSS consists of different modules, which may be grouped together into user interac 
modules, file I/O modules and the evaluation modules. 
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4.1 User interaction modules 

These modules help in inputting the PIG, interacting with the user to develop a PIG, and 
display the same. Correspondingly, there are three modules, the graph edit module, the 
graph network module and the graph display module. 

The graph edit module performs all the editing operations on the PIG. This module 
contains routines for adding a new node, modifying the parameters of an existing node 
and deleting a node. This module treats each node as an independent entity and while 
performing any operations on a node, the rest of the graph is considered. This is helpful 
because, in reality, the knowledge about the different activity nodes and its corresponding 
statement nodes, etc., may be better known to different people. For example, the attribute 
values of the ‘transmitter design’ activity may be better known to an expert in that field, 
while the attribute values of the ‘signal processor design’ activity may be better specified 
by an expert in signal processing techniques. Thus when the details of a particular activity 
are being input from one person, he or she may not be aware of the knowledge already 
acquired by the IDSS, and what information is still to be fed in. 

Once all the nodes and their parameter values have been input using the edit module, the 
graph network module goes about linking these nodes based on the information present in 
each node. If in the process of linking, it finds that a particular node to which a link should 
exist has not yet been input, it interacts with the user to get the details of this node. 

As the name suggests, the graph display module displays the presently active PIG on 
the terminal screen. 

4.2 File I/O module 

The file I/O module stores the input PIG in a file. The file format is similar to a relational 
database file. The various nodes of the PIG are stored as objects while the links between 
the nodes are stored in the form of relational tuples. Apart from the PIG, the resource 
availability list also needs to be stored. This is stored in a separate file. 

4.3 Evaluation modules 

Apart from the modules for evaluation of the influence diagram and activity scheduling, 
there is a module for checking the consistency of the PIG. This module ensures that there 
are no cycles in the activity precedence network. This module is also invoked by the 
influence diagram evaluation module to check that the influence diagram is regular and 
contains no cycles. 

The scheduling problem is formulated as a state space search problem, with the state 
being defined as the set of activities that have been fully scheduled. The A* algorithm 
is applied to reach the goal state of having all the activities scheduled. The initial start 
state has a null set of activities and is put on the open list. Each state has a heuristic value 
attached to it which is the estimated time for completion of the project when only the 
precedence constraints are considered. The open list of states is always maintained, sorted 
in ascending order with respect to this heuristic value. The first state in the open list is 
picked and checked if it is the goal state. If not, the state is put on the closed list, and is 
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BEGIN 

form the goal state 

initialise project calendar to the overall project start date 
make a list of activities that have no preceding activity constraint 
FOR each possible combination of this list of activities, 

check if resource requirements of the combination are met 
if yes, form a state node and calculate its heuristic value 
place all the state nodes in open list, sorted in ascending order 
DO BEGIN 

pick the first state node from the Open list 
IF the open list is empty, EXIT with failure 
advance calendar to the nearest end date or to the nearest 
resource availability date 

mark all the activities which have end dates equal to the 
present calendar date as finished 
IF present state is goal state, EXIT with success 
add the present state node to the closed list 
expand present state to create successor state nodes 
add the successors nodes to the open list, maintaining the 
ascending order 

END 

END 


Figure 4. Scheduling algorithm. 


expanded to derive the successor states which are inserted into the open list based on their 
heuristic values. If the open list empties before the goal state is reached then the algorithm 
has failed to find a feasible schedule of activities. 

The scheduling algorithm (figure 4) makes use of two other data structures, the Project 
Calendar and the Resource Availability List. The project calendar is akin to the simulation 
clock that is employed in Discrete Event System Simulations. The project calendar is 
initialised to the start date of the first set of activities that are scheduled, and is advanced 
to the nearest end date among the activities that are presently in progress. The activity 
is marked finished, thus enabling the scheduling of its successor activities, subject to the 
availability of resources and the earliest start date constraints. The resource availability list 
maintains the list of available resources and the dates at which they are made available. 

The search module is invoked recursively while scheduling an activity which has sub¬ 
activities. Suppose an activity a 1 has met all the constraints and can be scheduled, the 
algorithm first checks if all the sub-activities of the activity have been scheduled. If not, 
the search algorithm descends to the immediate lower hierarchy level invokes itself to 
schedule the sub-activities of a 1. In case an activity has different alternate courses of 
action and a choice has to be made, the influence diagram evaluation module is invoked. 
The influence diagram evaluation module is also invoked whenever an activity duration 
has to be estimated. 

The influence diagram evaluation module requests the user for the conditional proba¬ 
bility values to be entered in form of ratings, i.e., the user is asked to assign integral values 
to the outcomes of the chance node, dependant on the outcomes of its conditional prede¬ 
cessors. The ratings are then normalised to derive the probability values. This approach 
has been followed because humans find it relatively easier to deal with whole numbers as 
compared to fractions. 
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The influence diagram is then evaluated by effecting a string of graphical transformations 
which are informational preserving, and the expected value of the duration is derived. The 
algorithm proposed by Shachter (1986) for the evaluation of influence diagrams has been 
employed. 

4.4 A sample session 

A sample session with the DSS, for the planning and scheduling of the R&D project of 
development of a new surveillance radar, is presented in this sub-section. 

The final objective of the R&D project may be stated as “acquisition of the surveillance 
radar technology” which is depicted as the value node in level 1 of figure 1. The activity 
which realises this objective is the “development of a new surveillance radar system”. The 
user uses the edit mode of the IDSS to enter these two nodes into the system database. 
Once the user specifies the just stated activity node to the system, the system enters into 
a dialogue mode requisitioning the various attribute values of this activity with the help 
of a string of relevant questions. For example, the user is asked if there are a number 
of different options available for building this radar system. If no, the system checks if 
this activity can be decomposed into sub-activities. The user may decompose it into a 
set of sub-activities/altematives that are depicted in level 2. Once the user specifies this 
decomposition, the system goes about eliciting the attribute values of these activities, 
finding out the various factors like the duration of these activities, the resources these 
activities require, etc. Thus, the PIG is grown recursively. 

Once the input PIG has passed the consistency check successfully, the system invokes 
the A* algorithm to come with a schedule of activities. The A* algorithm initially starts at 
level 1, with the goal state being the scheduled activity “development of a new Surveillance 
Radar System”. In order to schedule this activity, the algorithm checks if a set of conditions 
are satisfied, such as 

• if this activity has any predecessor activities that have to completed before this activity 
may be scheduled, whether these activities been completed. 

• are all the resources required for this activity available 

• if this activity has sub-activities/altematives in the lower level, whether all have been 
scheduled. 

Since this activity has a set of sub-activities, the A* algorithm is invoked in a recursive 
fashion, with the goal state now being that this set of sub-activities be scheduled and 
finished. Now, there are a number of distinct possible options for scheduling this set 
of activities. The activities “specification finalisation” and “technology survey” are both 
eligible to be scheduled since both meet the set of conditions. Now, either activity may be 
scheduled followed by the other, or both the activities may be performed in parallel. The 
algorithm chooses that option which minimises the overall project duration, provided the 
resources required for that particular option are available. Once the activity “specification 
finalisation” is scheduled, i.e., the start date has been finalised the duration of the activity is 
essential to decide on its end date. If the duration is not very clearly known to the manager, 
but instead he comes up with a graph of factors that influences the duration, as shown in 
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figure 1, the influence diagram module is now invoked to evaluate this influence diagram 
and estimate the expected duration. 


5. An example evaluation of an influence diagram 

The evaluation of the hypothetical influence diagram that has been shown as a part of 
figure 3, for the decision on the choice of the alternative to be followed for the development 
of a new radar system, has been shown below. 

The sequence of the probability tables shown below is the order in which the IDSS 
request the user for the information. The IDSS requests the user to provide ratings, instead 
of the probabilities, since the human mind finds it easier to deal with integral values. 

Table 1 contains the probability of choosing a transmitter tube, considering the fact that 
the tube will have to be imported, or is available indigenously, or has to be developed. 
This table represents the preference of the project engineer to choose a particular type of 
transmitter tube given a particular constraint. In the given table, given the fact that the 
transmitter tube can be imported, the project engineer prefers to use the ‘cfa’ tube when 
compared to the other type of tube. The values of this table are thus input by the project 
engineer who is dealing with the design of the transmitter sub-system. The values of table 2 
are input by the project manager. Table 2 shows that the project manager prefers to use 
a component that is available indigenously compared to either importing a component or 
developing it, possibly having considered the financial limitations. Thus, table 2 represents 
the knowledge or the preference of the overall project manager. The above is an example of 
how the knowledge of various experts is pooled together while taking a decision. In table 3 
the project engineer inputs his choice of a particular type of transmitter tube given the 
different expected clutter scenarios. In case the radar is likely to encounter strong clutter, 
the engineer prefers to use either the ‘cfa’ tube or the ‘twt’. In this clutter scenario, the ‘mag’ 


Table 1. Probabilities of ‘tx-tube’ con¬ 
ditioned on the factor ‘component-avail¬ 
ability’. 



import 

indigenous 

develop 

twt 

0.3636 

0.4000 

0.2500 

mag 

0.1818 

0.1000 

0.6250 

cfa 

0.4545 

0.5000 

0.1250 


Table 2. Marginal probabilities of 
‘component-availability’. 


import 0.3333 

indigenous 0.5555 

develop 0.1111 
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Table 3. Probabilities of ‘transmit¬ 
ter tube’ conditioned on ‘clutter’. 



strong 

weak 

none 

twt 

0.5000 

0.4167 

0.2000 

mag 

0.0000 

0.1667 

0.6000 

cfa 

0.5000 

0.4167 

0.2000 


Table 4. Probabilities of‘transmit¬ 
ter tube’ conditioned on ‘power’. 



hi 

low 

twt 

0.2500 

0.4167 

mag 

0.3333 

0.2500 

cfa 

0.4167 

0.3333 

Table 

5. Probabilities of 

‘power’ 

conditioned on ‘pd-pfa’. 


hi-low hi—hi low-low 

hi 

0.8000 0.8000 

0.4000 

low 

0.2000 0.2000 

0.6000 

Table 6. Probabilities of 
conditioned on ‘range’. 

‘power’ 


long medium 

short 

hi 

0.6667 0.5000 

0.3333 

low 

0.3333 0.5000 

0.6667 


tube is not a good choice. In a similar fashion, the project engineer, as when prompted by 
the IDSS, inputs his preference based on his/her previous experiences or knowledge, the 
choice of a transmitter tube based on other factors such as the required operating power of 
the tube, which is shown in table 4. However, the transmitter power requirement is itself 
governed by other factors, the knowledge about which is better known to a radar system 
engineer rather than to the project engineer who is an expert in the transmitter sub-system 
alone. The radar system engineer inputs his knowledge in tables 5 and 6 which contain 
the probabilities of the transmitter power requirements for the various outcomes of the 
required ‘pd-pfa’ and ‘range’, based on certain system calculations. The two tables again 
represents the preference of the radar system engineer to either go in for high power system 
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Table 7. Probabilities of ‘power’ condi¬ 
tioned on ‘target characteristics’. 



strong 

weak 

hi 

0.2500 

0.7500 

low 

0.7500 

0.2500 

Table 8. 

Probabilities of ‘pd-pfa’ condi- 

tioned on the influencing factor ‘clutter’. 


strong weak 

none 

hi-low 

0.1429 0.2222 

0.4000 

low-hi 

0.5714 0.3333 

0.3000 

low-low 

0.2857 0.4444 

0.3000 

Table 9. 

Probabilities of ‘pd-pfa’ condi- 

tioned on ‘range’. 



long medium 

short 

hi-low 

.0.1250 0.2222 

0.5714 

low-hi 

0.3750 0.4444 

0.2857 

low-low 

0.5000 0.3333 

0.1429 

Table 10. 

Probabilities of ‘pd-pfa’ con- 

ditioned on ‘target-characteristics’. 


strong 

weak 

hi-low 

0.6667 

0.1250 

hi-hi 

0.1667 

0.5000 

low-low 

0.1667 

0.3750 


or a low power system considering the fact that the radar has a long operating range or a 
short operating range, and the probability of detection of the targets and probability of false 
alarms. Table 7 has the probabilities of the transmitter power requirements considering the 
fact that the target returns may be strong or weak, which are again input by the radar 
system engineer. The values of tables 8, 9 and 10 show the probabilities of the ‘pd-pfa’ 
values (probability of detection and of false alarm) that one is likely to get considering the 
various factors such as ‘range’, ‘target’ returns, and ‘clutter power’, which are provided 
by the radar system engineer based on his/her previous experiences. Tables 11, 12 and 
13 show the marginal probabilities of the ‘range’, ‘target characteristics’ and ‘clutter’ 
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Table 11. Marginal probabilities of 
‘range’. 


long 

0.5714 

medium 

0.2857 

short 

0.1429 

Table 12. 

Marginal probabilities 

of ‘target characteristics’. 

strong 

0.2500 

weak 

0.7500 

Table 13. 

Marginal probabilities 

of ‘clutter’. 


strong 

0.4615 

weak 

0.3077 

none 

0.2308 

Table 14. 

The ratings of the three 

choices for the ‘tx_tube’ as evaluated 

by algorithm. 

twt 

0 

mag 

0 

cfa 

4 


that the project manager expects the radar to encounter. Table 14 shows the result of the 
evaluation of the influence diagram of all the factors that are likely to influence the decision 
of selecting the transmitter tube. It is clear from the table that the choice of ‘cfa’ type of 
transmitter tube is the best given the present infonnation. For the different clutter scenarios, 
the probabilities of choosing between the different types of radar, so that best results may 
be obtained, are presented in table 15. Table 16 shows the type of radar that one is likely to 
build considering the fact that the transmitter tube chosen is ‘cfa\ Finally, table 17 shows 
the result of evaluation of the influence diagram corresponding to the decision about the 
choice of the alternative. 

It may be reiterated here that the user may input the ratings instead of probabilities, since 
the human mind finds it easier to deal with integral values. For example, the radar engineer 
may input his preference for a transmitter tube given the fact that the component has to be 
chosen from the indigenously available tubes, in the form of 4:1:5 for twt:mag:cfa. 
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Table 15. Probabilities of the three options con¬ 
ditioned on ‘clutter’. 



strong 

weak 

none 

coherent 

0.5714 

0.3750 

0.1429 

coh-on-receive 

0.2857 

0.5000 

0.1429 

non-coherent 

0.1429 

0.1250 

0.7143 


Table 16. Probabilities of the three 
options conditioned on ‘transmitter 
tube’. 



cfa 

coherent 

1.0000 

coh-on-receive 

0.0000 

non-coherent 

0.0000 

Table 17. The ratings of the three 

options as evaluated by the algo¬ 
rithm. 

coherent. 

3 

coh-on-receive 

0 

non-coherent 

0 


6. Conclusion 

This paper briefly discusses the necessity and importance of IDSS in project management, 
and goes on to describe an IDSS which uses a new project management tool, the project 
influence graph, AI search algorithm and influence diagram evaluation algorithm to output 
a feasible schedule of activities, taking into consideration the constraints of resources and 
time. 

References 

Cooper K G 1994 The rework cycle: Vital insights into managing projects. IEEE Eng. Manage . 
Rev. 21(3): 4-12 

Fox M S 1985 Knowledge representation for decision support. Knowledge representation for 
decision support (ed.) R H Sprague (Amsterdam: North-Holland) 

Howard R A, Matheson J E (eds) 1981 Influence diagrams. In The principles and applications of 
decision analysis (Menlo Park, CA: Strategic Decisions Group) 

Miller A C Merkhofer M M, Howard R A, Matheson J E, Rice T R 1976 Development of 
automated aids for decision analysis (Menlo Park, CA: Stanford Res. Inst.) 



Intelligent decision support system for project managemen t 


343 


Nilsson N J 1981 Problem solving methods. In Artificial Intelligence (New York: McGraw-Hill) 
Noronha S J 1993 Intelligent decision support systems for project planning and scheduling. 
Ph D thesis, Dept, of Computer Science and Automation, Indian Institute of Science, 
Bangalore 

Noronha S J, Sarma V V S 1989 Artificial intelligence and knowledge-based approaches for 
scheduling problems. In Project Management - Proc . Int. Conf Expert Systems for Develop¬ 
ment , Kathmandu, pp 105-114 

Object oriented programming: Special issue, Aug. 1989. Comput. J. 32: 4 
Shachter RD 1986 Evaluating influence diagrams. Open Res. 34: 871-882 
Zadeh L A 1965 Fuzzy sets. Inf. Contr. 8: 338-353 

Zhang W R, Chen S, Bedzek J C 1989 Pool2: A generic system for cognitive map development 
and decision analysis. IEEE Trans. Syst. Man Cybern. 19: 31-39 




Sadhana , Vol. 21, Part 3, June 1996, pp. 345-362. © Printed in India. 




Synthesis of unlimited speech in Indian languages using 
formant-based rules 

XAVIER A FURTADO t and ANIRUDDHA SEN 

Computer Systems and Communications Group, Tata Institute of Fundamental 
Research, Homi Bhabha Road, Colaba, Mumbai 400 005, India 
email: asen@tifrvax.tifr.res.in 

Abstract. Synthesis of continuous and unlimited speech is a matter of theo¬ 
retical as well as technological interest. Independent efforts are needed for syn¬ 
thesis in Indian languages which are substantially different from English and 
other European languages. The paper discusses basic synthesis issues like text- 
to-phoneme and phoneme-to-speech conversion and incorporation of prosody. 
The three commonly adopted methodologies of concatenation, formant and ar¬ 
ticulatory syntheses are compared. The TIFR phoneme-to-speech synthesizer 
which utilizes a standard formant synthesizer as a speech production model 
■ is described and the methodology for evolving and organizing formant-based 
rules to drive the used synthesizer is emphasized. The results of some per¬ 
ception tests are reported and a few potential applications are suggested. The 
direction of the future work for enhancing the quality and expanding the scope 
of the synthesizer is indicated. 

Keywords. Speech synthesis; computer speech; Indian language synthesis. 


1. Introduction 

Incorporation of human faculties like speech and vision into machines is a basic issue of 
artificial intelligence research. The capabilities of a computer to accept spoken input and 
generate speech output are termed as speech recognition and synthesis respectively. 

Speech synthesis requires a profound understanding of speech production and percep¬ 
tion, and has always been a topic of great interest in speech and cognitive sciences. Recent 
advances in computer, communication and information technologies have considerably 
enhanced its technological motivations, particularly for fast and automatic access of infor¬ 
mation through telephone. 

In its simplest form, ‘computer speech’ can be produced by playing out a series of digi¬ 
tally stored segments of natural speech. The segments may be coded for storage efficiency. 


* Deceased 


345 


346 


X A Furtado and A Sen 


The technique involved is elementary. However, it cannot be extended for producing con¬ 
tinuous speech with unlimited vocabulary because the context sensitivity of speech makes 
simple merging of stored speech segments into larger utterances unacceptable. ‘Unlimited’ 
speech synthesis, to which we will restrict our subsequent discussions, has to deal with 
this and several other issues. 

Over the years, several independent methodologies for unlimited speech synthesis in 
English and other European languages have evolved. Although speech synthesis research 
in India has greatly benefited from such work, substantial differences in phoneme sets and 
stress patterns in Indian languages rule out direct adoption of the techniques and makes 
indigenous development mandatory. 

This paper presents a phoneme-to-speech synthesizer developed by the authors in the 
speech laboratory of the Tata Institute of Fundamental Research (TIFR). It can synthesize 
unlimited speech in Hindi and English as spoken by typical Indians. From among sim¬ 
ilar synthesizers in Indian languages, it stands unique in its methodology of generating 
speech wholly by a production model whereas other synthesizers utilize stored speech 
in some way or the other. The speech production model used is a standard formant syn¬ 
thesizer suggested by Klatt (1980) and it is driven by formant-based rules developed 
indigenously. 

The following sections briefly discuss the issues involved in speech synthesis and various 
methodologies which can be applied to tackle its core problem - phoneme-to-speech 
conversion. The TIFR synthesizer is then described in some detail and its performance 
reported. 

2. Issues related to unlimited speech synthesis 

The basic issues related to unlimited speech synthesis can be categorized as: (a) Text- 
to-phoneme conversion (b) phoneme-to-speech conversion and (c) application of prosody. 
Such categorization breaks down the synthesis problem into fairly independent components 
and allows reasonably modular solution. 

2.1 Text-to-phoneme conversion 

Am ‘unlimited’ speech synthesizer has to accept any arbitrary input message which is 
to be converted to speech. In most practical situations, it is convenient to specify the 
messages in textual form. Text is acceptable to most people and can be easily processed by a 
computer. 

Text-to-phonerne conversion means the conversion of the textual input message into its 
corresponding pronunciation which is specified by means of a string of phonemes. The task 
is comparatively easy for the languages where the written text and the pronunciation are 
related in a simple and straightforward manner (e.g. Hindi) and is non-trivial where they 
are not (e.g. English). The conversion, in general, involves derivation of letter-to-phoneme 
rules, dictionary formation and look-up, and morphological analysis. As the input text is 
analyzed by the text-to-phoneme module, it should preferably extract information related 
to prosody (§ 2.3) which is derivable from such analysis. Clearly, there can be no thin g like 
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a ‘perfect’ analysis of text. Text-to-phoneme conversion, therefore, is an open problem 
and requires continous research and development. 

As stated earlier, ours is presently a phoneme-to-speech synthesizer. However, it can be 
converted to a complete text-to-speech system by adding any appropriate text-to-phoneme 
module at the front end. Such modules, with acceptable quality, have already been devel¬ 
oped for some Indian languages (Bhaskararao & Mathew 1992; Bhaskararao et al 1994). 
But until now, precisely little work has been done on text-to-phoneme conversion for Indian 
English. 


2.2 Phoneme-to-speech conversion 

Phoneme-to-speech conversion can be termed as the ‘essence’ of speech synthesis and the 
work presented here is mostly concerned with this. Various phoneme-to-speech conversion 
methodologies in general are discussed in § 3, whereas our specific work is described in 
§§ 4, 5 and 6. 

2.3 Application of prosody 

Prosody, in simple acoustic terms, means the variation of pitch, intensity and (intrinsic) 
duration of the utterances with time. Proper prosody is to be applied for making the 
synthetic speech natural sounding and at least a minimal amount is needed to make the 
speech even intelligible. Determination of proper prosody depends on analysis of syntax 
(grammar), semantics (meaning) and pragmatics (linguistic context). Therefore, this is 
also an open research problem. 

In our synthesizer, only elementary prosodic rules have been applied so far. (These 
are discussed in § 6.7.) The immediate emphasis was on making the synthetic speech 
intelligible. As this was done with reasonable degree of success, incorporating better 
prosodic rules for making the speech more natural-sounding is the next step. 

3. Various methodologies for phoneme-to-speech conversion 

For phoneme-to-speech conversion in an unlimited speech synthesizer, the related prob¬ 
lems can be broken down into the following sub-problems: (i) selection of basic units 
(ii) generation of the selected basic units and (iii) concatenation of the basic units for 
synthesizing continuous speech. 

The methodologies for synthesis may vary, but in order to generate unlimited speech out 
of a limited corpus of stored information, (i) the selected basic units must be small enough 
(e.g. phonemes, syllables) so that the total inventory is limited and (ii) rules of some kind or 
the other are to be applied for concatenating such units into continuous speech, irrespective 
of the methodology adopted. 

The methodologies commonly used can be divided into two broad categories: (a) synthe¬ 
sis by concatenating stored natural speech segments (coded or uncoded) and (b) synthesis 
by generating speech from a speech production model. The latter, in turn, can be clas¬ 
sified into (i) formant synthesis, where the model is based on several formant frequency 
resonators and (ii) articulatory synthesis, where effort is made to mathematically model 
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the static and dynamic characteristics of various articulators. A brief description of thesi 
three methodologies along with their comparative study follows. 


3.1 Synthesis by concatenation 

A fundamental problem of speech synthesis is that the acoustic manifestations of basi 
units of the utterances, i.e phonemes, are very much sensitive to the phonemic contexl 
Also, the transition from one phoneme to another carries substantial amount of informatioi 
necessary for the perception of both the phonemes. 

In concatenation synthesis, a limited number of stored segments obtained from rea 
speech are used. However, in view of the above mentioned reasons, the inventory cannc 
be just a set of single phonemes. A phoneme cut out from a context would most probabl; 
not fit into another and a sequence generated by pasting such phoneme splices would b 
completely unintelligible. In order to capture the contextual variations and transitions 
splices at least equivalent to diphones are to be taken as basic units. In addition, if th 
transitions are not to be missed, the end-points of the splices must be in the ‘steady’ regio 
of speech. 

Diphones, demi-syllables are some of the basic units normally selected for the purpos 
(Dan et al 1990; Bhaskararao & Mathew 1992). For Indian languages, even ‘characters 
from the written scripts (consisting of CV syllables like ‘koo’, vowels like ‘ee’ and consc 
nants like ‘p’ which can be represented by a single written character) were used as basi 
units (Rajesh Kumar et al 1989). In general, if n is the total number of phonemes, the num 
ber of elements in a diphone-type of inventory will be of the order of n 2 . With a typict 
number of phonemes like 50, the inventory will run to 2500 or so. However, as all diphon 
combinations do not occur and as a single splice can often be made to represent ‘a group 
with minimal degradation of quality, sizable pruning of the inventory is possible. 

The basic advantage of the concatenation method is its simplicity. As the units are take 
from real speech, the cumbersome task of ‘generating’ them is obviated. Complicate 
transitions like CV are automatically ‘captured’ in totality by the actual speech data and th 
‘rales’ for concatenating the basic units are therefore elementary. This allows developmer 
of acceptable systems with limited investment of time and specialized skill. If the origin* 
time waveforms are stored, the processing is basically in the time domain and the synthesi 
is very fast. It is then possible to make real-time synthesizers on general-purpose machine 
like PCs. 

Some problems, however, are associated with this method. During data collection, it i 
not easy to ensure that all the steady-state ends of the segments (e.g. all demi-syllable 
ending and beginning with l\l) are exactly matching in formant frequencies etc. This ca 
lead to ‘jerks’ at the joints. Also, it is obvious that this method needs some ‘steady’ segment 
of speech. However, in fluent speech with substantial co-articulation, such steady-state 
will often be non-existent and it is difficult to capture such fluency with this method. Man 
consonants are almost completely ‘transitory’ in nature. It is difficult to reconstruct varioi: 
clusters of such consonants from segments with steady-state end-points. Another problei 
is that while storing segments, the allophones also should be added to the list of phoneme 
and this increases the inventory (proportional to the square of phonemes plus allophone; 


substantially. Also, in this method, the voice is ‘personalized’ and a totally new segment 
inventory is to be prepared for generating a different voice. 

3.2 Formant synthesis 

This method, adopted by us and described later in detail (§§ 4,5,6), is based on the gener¬ 
ation of speech from a production model which has formant frequencies, energies, voice 
source control parameters (e.g. pitch) and few other acoustic-phonetic parameters as con¬ 
trol variables. It is possible to generate any arbitrary speech by applying a set of context- 
sensitive rules on the stored phonemic data for enacting contextual modifications and for 
generating transition segments. 

This method can overcome many of the limitations of the concatenation method. With 
single phonemes as the basic units, rules for smooth concatenations can be formulated. With 
only a few additional rules and little additional data, consonant clusters and allophones can 
be handled elegantly. Presence of a ‘steady-state’ is not mandatory, hence co-articulations 
can be incorporated. Also, as the model abstracts the mechanism of speech production, 
the resulting voice is not personalized and can be altered easily by changing a few control 
parameters. The effort needed to switch over to another similar language with a marginally 
different phoneme set is also minimal and this is very important in a multi-lingual country 
like India. Overall, this synthesizer has the potential to surpass a concatenation synthesizer 
in both quality and versatility. 

However, in order to realize such potential, a set of appropriate rules are to be derived 
and this is not a trivial task, particularly because such rules are basically ‘heuristic’ in 
nature. The time and specialized skill needed to develop such a synthesizer is therefore 
high. In spite of the best of efforts, there will be approximations at two levels: in modelling 
the speech production as well as in capturing the model control parameter variations by 
a finite set of rules. A continuous effort to minimize the deviations are therefore called 
for. Fortunately, the synthesizer framework can be made flexible enough to allow gradual 
improvements. As the model uses a number of resonators, it is computationally expensive. 
But with the current ‘hardware revolution’, affordable and near real-time synthesizers 
using this technique are now in the realm of possibility. 

3.3 Articulatory synthesis 

For generating various utterances, the formant synthesizers control acoustic-phonetic fea¬ 
tures like formant frequencies, energies, pitch etc., whereas when we speak, we have no 
independent control over them. This necessitates ‘heuristics’ for rule formation. Articu¬ 
latory synthesis strives to eliminate the problem by modelling the actual mechanism of 
speech production through the articulators and their movements. Thus it is expected that 
more elegant and conceptually clearer rules can be formulated. It should also be easier 
to capture the production mechanism in totality if the modelling is done on the basis of 
a finite and exhaustive set of articulatory movements rather than on the basis of the vari¬ 
ation of a subjectively selected set of acoustic-phonetic features (as is done in formant 
synthesis). 


Here, too, the problem is to realize the theoretical potential by optimizing the mathemat¬ 
ical model and collecting sufficient data (including X-ray data of articulator movements 
during production of various types of utterances). This method is also computationally 
very expensive. Overall, although this can be described as the synthesis strategy of the 
future, at the moment it is more a research issue than a commercial reality and more so in 
India. 


4. TIFR synthesizer overview 

The phoneme-to-speech synthesizer developed at TIFR is completely a software synthe¬ 
sizer. The only special hardware it needs is a D/A converter, along with de-aliasing filters, 
for playing back the synthetic speech. A sampling frequency of 10 kHz and a de-aliasing 
low-pass filter cut-off frequency of about 4.7 kHz is used. The synthesizer was imple¬ 
mented on a Microvax II computer where it ran at around 24 times real time (i.e. 24 s was 
needed to synthesize 1 s of speech). A demonstration version was also implemented on a 
PC-386, where it ran at around 13 times real time. 

As input, the synthesizer accepts a string of phonemes from a repertoire that is made 
up of 57 phonemes normally used in Hindi and Indian English, and markers for silence 
and question mark. Figure 1 lists these phonemes (and their classes) along with their one 
or two character symbols which can be entered from an English keyboard. (Note: If the 
second character of the symbol is T it can be optionally omitted. Thus ‘kl’ and ‘k’ 


1. Silence 

2. Vowels 


3. Stops 


4. Affricates 

5. Nasals 

G. Semi- vowels 

7. Trills/Flaps 

8. Laterals 

9. Fricatives 


ull, aa, eq, ee, uq, oo, 

W W 5 f 3 5. 

ey, ae, oh, aw, eh 

t \ 3lr 4 

uh$,a«$,oq$,ec!{,uqll!,oo$, 
3fT j £ 7 5! 

ey $ ,ae$ ,ol 1 $ ,aw $ ,eh$ 

kl, 1(2, gl, g2, 
oR 3T 3T €T 
tl, t2, til, d2, 

? 5 s: ? 

t3, t-1. (13, d4, 
fr 3T ? ir 

pi, p2, hi, L>2 

x «r k 

Cl, C2, jl,j2 

•tr vr fr 

in, ill, n'2, u3 
TT * VT & 

w 

*r sr 
r, d5 
T 5** 

1 

rT 

ss, sh, li, z, f, v, 

K 5T ^ viT *5* 


Figure 1. Phoneme repertoire. 
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Figure 2. Block diagram of the TIFR synthesizer. 

are equivalent.) The synthetic speech output can be generated in any of the four voices: 
(i) low-pitched male voice (ii) high-pitched male voice (iii) high-pitched female voice and 
(iv) low-pitched female voice. However, only the male voices have been fine-tuned. 

The synthesizer (figure 2) is made up of two fairly independent modules: (i) The acoustic- 
phonetic speech production module or the parameter-to-speech module (§ 5) accepts a 
number of acoustic-phonetic parameters updated at regular intervals as input and generates 
corresponding synthetic speech. This is essentially the formant-based speech production 
model, (ii) The synthesis-by-rule module or the phoneme-to-parameter module (§ 6) gen¬ 
erates the input parameters needed by the parameter-to-speech module. It is basically the 
formant-based rule module which takes the current and adjacent phonemes into consider¬ 
ation while generating each parameter contour. Currently, each parameter is updated at an 
interval of 5 ms, which is small enough to capture most of the variations in speech. 

An important aspect of any synthesis or recognition methodology is to what extent it 
is language-independent. The extent to which the advanced techniques developed interna¬ 
tionally can be utilized depends on this. For the currently described synthesis methodology, 
whereas the speech production model can be considered to be reasonably valid for any lan¬ 
guage, the phoneme to parameter conversion rules, which have to capture the variation of 
acoustic-phonetic features corresponding to the specific articulations of a given languages, 
should be language specific. 

To start with, it was therefore decided to have a speech production model similar to the 
one described by Klatt (1980). The set of rules for generating the time variations of its 
control parameters was developed fully at TIFR. 


5, JParameter to speech conversion 

The main components of this module are: (i) Voicing source, (ii) noise source, and (iii) a 
series of resonators and anti-resonator(s). By proper arrangement of these components, 
virtually any speech sound can be faithfully synthesized. 

The voicing source used is an impulse train with a periodicity corresponding to the given 
pitch period. It is then shaped by an appropriate low pass filter. Provision for incorporating 
a more ‘natural’ voicing source exists. This source is used to generate vowel as well as 
voiced consonants. 

The noise-like sounds (e.g. frication or a burst from the release of a stop consonant) 
require a different type of source. It is implemented by a random noise generator and the 
output is again shaped by ‘soft filtering’. 




352 


X A Furtado and A Sen 


A series of resonators model the vocal tract. An anti-resonator is incorporated to facilitate 
the generation of nasal vowels and nasal consonants and provision exists for adding more, 
if the need arises. 

A digital resonator is implemented (Klatt 1980) by the equation 

y{nT) = Ax(nT) + By(nT -T) + Cy{nT - 2 T), (1) 

where y(nT),y(nT — T) and y(nT — 2T) are the current and two previous output samples 
and x(nT ) is the current input sample. The resonator coefficients A, B,C can be computed 
by 


C — — exp(2nB w T), 

(2) 

B = 2exp(nB w T) cos(2 ttFT), 

( 3 ) 

A = 1 - B - C, 

( 4 ) 


where F is the central frequency of the resonator (i.e. the formant frequency), B w is its 
bandwidth and T which is l/(sampling rate) is equal to 0.0001 second at the used sampling 
rate of 10 kHz. 

For anti-resonators, the equation (Klatt 1980) is 

y(nT) = A'x(nT) + B'x(nT -T) + C'x(nT - 2 T), (5) 

where x(nT — T) and x(nT — 27) are the previous two samples of the input x(nT). The 
coefficients are determined by 

A! = 1.0/A, B' = —B/A and C' = -CA, 

where A, B and C are obtained by using the antiresonance center frequency F and band¬ 
width B w in (2), (3) and (4). 

The resonators can be connected in cascade or in parallel. Both type of synthesizers 
were previously used with good effect. For example, while Klatt (1980) used cascade 
synthesizer. Holmes used the parallel one (Holmes et al 1964; Holmes 1983). As our idea 
was to waste least effort on the topics already worked on, we started simply by adopting 
the cascade synthesizer methodology used by Klatt and pursued it, as no serious problem 
was faced. 

In this method, the resonators are connected in cascade for generating voiced sounds. 
In other words, the first resonator receives the source pulse train as input and then the 
output of a resonator is fed as the input to the next one. The output of the last resonator is 
the resulting voice output, having spectral poles at all the resonance frequencies. Spectral 
zeros can be similarly incorporated by adding anti-resonators. 

A minor problem associated with cascade synthesizers is that the amplitudes of different 
resonance peaks cannot be directly controlled. However, it can be controlled indirectly by 
adjusting the bandwidths of the resonators. Furthermore, there is a provision to ‘tilt’ the 
spectrum i.e. to attenuate the higher frequency components by appropriate filtering. Proper 
use of these controls are essential especially for generating voiced consonants (e.g. voice 
bars, nasals). 

For generating the noise spectra, however, the resonators are connected in parallel. This 
gives a more realistic noise spectra. The filtered random noise is passed through these 
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synthesis-by-rule 
module 


\ 



Figure 3. Acoustic-phonetic speech production module schematic. 

arallel resonators and the amplitude of each resonator is tuned to generate the desired 
oise spectra with specific energy concentration patterns. 

For generating ‘aspiration’ sound (the one found in /h/ or in aspirated consonants), the 
indom noise is instead passed through the cascade resonators. As aspiration is essentially 
ication near the vocal fold and the noise passes through the entire vocal tract (unlike other 
onstrictions which are made somewhere within the vocal tract), it is better modelled this 
/ay. 

Currently, six resonators and one anti-resonator are being used. 

Figure 3 gives a block level representation of the parameter to speech conversion 
cheme. 


. Formant based rules 

: or parameter to speech conversion, the production model needs control parameter streams 
/hose time-varying values correspond to any arbitrary input phoneme sequence. To pro- 
luce the streams automatically, a comprehensive set of rules are employed. The techniques 
or their formulation and organization are described hereafter in some detail. 

>. 1 Indigenous development 

'he rules are evidently language-dependent. To state a few differences between Indian 
anguages and English: (i) Many Indian languages (e.g. Hindi) have separate aspirated and 
maspirated consonants, (ii) Likewise, there are separate retroflexed and non-retroflexed 
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consonants in Indian languages, (iii) The criteria for distinguishing voiced and un 
consonants are not identical for Indian languages and English, (iv) Most Indian lan 
have nasalized vowels, which .do not feature in English (although they are pre 
some European languages like French), (v) Also, the American/British accent is m 
acceptable to native Indian listeners. 

As a result, a totally indigenous set of rules had to be evolved and this activity 
the centre of our attention from the start. The task was made even more difficult 
the lack of adequate information regarding the organization of rules in other lan; 
presumably because of proprietary problems. 

6.2 Classifying phonemes and parameters for reducing rules 

As mentioned earlier, the acoustic-phonetic manifestations of phonemes are coni 
pendent. In order to capture their variations, the rules must also be context sensitiv 
consider even a diphone context and consider the independent variation of each an 
parameter, clearly a combinatorial explosion will result. 

The speech production model has about 40 control variables. However, quality s) 
is possible by keeping many of them constant throughout the utterance and varying a 
to 20 parameters only. Even then, if there is a rale for concatenating each varied pa 
for each diphone context, more than 50,000 rales would be needed. 

Fortunately, phonemes can be classified into types which exhibit similar transiti< 
acteristics. This was first shown by Liberman ei al (1959). Rao & Thosar (1974) e: 
the concept to Indian phonemes. Phonemes can be classified according to their r 
of articulation or places of articulation or both. We have classified them according 
manners of articulation in order to take advantage of the obvious similarities in stea 
and transition characteristics of phonemes thus classified. 

Theoretically, further reduction of rales and phoneme data storage by classif; 
consonants according to their places of articulation (velar, palatal, retroflex, dental, 1 
possible. The resulting reduction can be cost-effective in the methods where storec 
segments are used. However, in our method which organizes the rales in a compact 
(§ 6.4) and uses minimum amount of data for each phoneme, the gain is insignific 
example, phonemes with the same place of articulation can use the same locus e< 
(§ 6.6). The saving in memory storage for the locus equation coefficients would 
small and would hardly compensate for the sacrifice of the accurate representatic 
formant movements of these phonemes. 

After considering the costs and benefits, further sub-classification of the phone 
cording to their places of articulation was not attempted. The phoneme classes 
solely on the basis of the manners of articulation are: (i) silence (ii) vowels (i 
(iv) affricates (v) nasals (vi) semi-vowels (vii) trills/flaps (viii) laterals (ix) f 
(figure 1). 

Consequently, concatenation rules are to be formed according to the classes of 
phonemes and this results in a phenomenal reduction in the number of rules, 
reduction was also done by dividing the control parameters into groups whicl 
similar variation patterns in any given context (e.g. frequencies and bandwidt 
formants). 
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6.3 Steady-state and transition segments 

For convenience of design, the duration of each phoneme is divided into a central steady- 
state segment and a transition segment at each end. The ‘steady state’ is steady in a 
comparative manner only. It is like a central reference segment for a phoneme from where 
we can proceed to construct parameter trajectories at each end. The steady-state segments 
may also be influenced by context and may embody transitions, but to a lesser degree 
in comparison with the transition segments. The transition segments, on the other hand, 
connect the steady-state segments of adjacent phonemes. For example, for a vowel-vowel 
combination the transition is done by directly interpolating the steady-state formant values 
of one to the other. In the case of stop consonants and affricates, the closure period is 
considered as the steady-state segment while the burst/frication and the formant-movement 
regions are collectively designated as the transition segment. Separate sets of rules are 
applied for constructing parameter trajectories in steady-state and transition segments. 


6.4 Compact and flexible organization of rules 

In anticipation of substantial additions and refinements of rales, these were organized in 
such a way as to give substantial amount of flexibility as well as compactness and this is 
one of the salient features of our synthesizer. This is implemented essentially by means 
of a set of two-dimensional tables (corresponding to each parameter class) which control 
the selection of ‘interpolation types’ (both for steady-state as well as transition segments) 
depending upon the classes of the adjacent phonemes. An ‘interpolation type’ specifies 
the manner in which a specific parameter in a specific context will vary. For example, it 
can linearly interpolate to the next target (as formant frequencies mostly do), can have 
a sharp change followed by a more gradual change (as does amplitude of voicing for a 
stop-vowel combination) or can just jump to the next target (as the amplitude of frication 
does at the plosive burst of a stop consonant). The interpolation type at a given context can 
be changed by just altering a few table entries. New interpolation types can be added, if 
needed, with little extra effort. It is even possible to change the classification of phonemes 
and parameters (e.g. if we want to put voiced and unvoiced stops as different ‘types’, a few 
more entries are to be added). In this way, a high degree of flexibility is attained. It is also 
clear that it makes the ‘rules’ very compact, as they are basically embodied into tables and 
the implementation of a handful of interpolation types. 

However, it will be often necessary to recognize the ‘independent’ feature(s) of some 
phonemes which could not be captured by the class characteristic. Such situations are 
taken care of by ‘exception’ rules. For example, from among the fricatives, the formant 
frequencies of only /hi show an affinity towards the adjacent vowels and this ‘allophonic 
variation’ is to be taken care of by rules explicit to /hi. As the number of exception rules 
could be kept quite small, the general organization, based on the classification of phonemes 
and parameters, is vindicated. 

Whereas the bulk of the rales in this system are implicitly embodied into tables and their 
implementations, only the ‘exception’ rules are specified in explicit if-then-else constructs. 
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6.5 Derivation of rules and extraction of phoneme data 
Rules are mainly derived using the following steps: 

1. Formation of the framework for rule organization by the selection of suitable groups 
for phonemes and parameters. 

2. Selection of a set of ‘interpolation types’ for capturing the variation of the parameters 
in the contexts of all possible phonemic classes. 

3. Provision of a means of determining which ‘interpolation type’ is to be selected in 
a given phonemic context. This is done for each parameter group by means of a two- 
dimensional table for the steady-state segments and another for the transition segments. 

4. Implementation of the ‘interpolation types’. 

5. Derivation of ‘exception’ rules. 

Each of these tasks calls for suitable acoustic-phonetic knowledge. No doubt, analysis 
of speech and measurement of its features form the basis of acquisition of such knowledge. 
However, the rules and their framework cannot be directly constructed from the measured 
data. They are rather obtained by the interpretation of data and application of intuition 
(heuristics). The rules thus theorized are validated and refined by actually using them for 
synthesis and testing the output by means of perception experiments. 

The rules need data, pertaining to each phoneme, to operate on. A table, containing 
steady-state parameters for each phoneme is maintained. Other important types of data 
stored are: (a) The burst spectra (amplitudes for each formant frequency during the noise 
burst) for each stop consonant. If needed, different entries are stored for front, mid and 
back vowel contexts, (b) Normal steady-state duration of each phoneme, (c) Transition 
time for each phonemic context and (d) Voice-onset-time (VOT) for each stop consonant. 
This is the time difference between the noise burst and the starting of voicing. 

Most of these data are initially estimated from spectrographic analysis of real speech. 
They are however refined through trial and error for best quality synthesis. 

6.6 Locus equations 

Among the variable parameters, formant frequencies are the most important ones for the 
perception of particular utterances. Generating foimant trajectories accurately is therefore 
of utmost importance. 

It is observed that the terminal (entry/exit) formant frequencies for a phonemic segment 
are not fixed, but are considerably influenced by the context. For example, if a phonemic 
segment /p/ is preceded by the vowel /u/, then the second formant frequency at the onset 
of stop will be very low (about 600 Hz) whereas it will be considerably higher (about 
1400 Hz) if the preceding vowel is l\l. Several methods to capture this variation of the 
terminal formant frequencies for a consonant corresponding to different vowels have been 
described in literature. We used Klatt’s (1987) ‘modified locus equation’, which is given 
by: 


Fonset — Floats “b ^{Fyowel F/ocusf 


(6) 
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Here, F onset is the formant frequency either while entering into a vowel from a given 
consonant or while exiting from a vowel into the same consonant. F vowe i is the steady- 
state formant frequency of the vowel. Frequency Fi ocus and constant k are characteristic 
features for any given consonant. To find them, F onset corresponding to as many F vowe i 
frequencies as possible are found and plotted for each consonant. As (6) is linear, the best 
straight line through all the plotted points gives the best estimate for Fi 0CU s and k. Set of 
values for k and Fi ocus are kept for all consonants and for each variable formant. Separate 
values can be stored for CV and VC contexts. Also, provision for keeping separate values 
for front, back and rounded vowels exists. 

6.7 Duration rules and intonation 

A few elementary duration rules which depend on the phonemic context and not much on 
the linguistic context were applied. These include rules for shortening of a vowel adjacent 
to a stop consonant and for pre-pausal lengthening of a phoneme. 

As for intonation, a flat pitch is used except for the last vowel when the pitch is raised or 
dropped, depending on whether it is an interrogative or assertive sentence (it is determined 
by the presence/absence of an interrogation mark). 

6.8 Steps for generating parameters by rule 

After briefly viewing the different aspects of the rule system, now we finally come to the 
various steps for generating the parameter streams (figure 4): 

1. The phoneme string is parsed, each phoneme is identified and associated with its class. 
Silence is implied both at the beginning and at the end of the phoneme string. Elementary 
rules like the singling out of geminates are applied. (Each geminate is replaced by a 
single phoneme, with increased duration). 

2. The intrinsic steady-state and transition durations for each phoneme are found from 
tables. Rules are then applied to modify them according to the context. 

3. For a given variable parameter and for a given phoneme, the transition segment is first 
interpolated between the steady-state of the previous phoneme and the steady-state 



figure 4. Synthesis-by-rule module schematic. 
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of the current one. This is done by using the interpolation type corresponding to tl 
phoneme class context (i.e. previous phoneme class and current phoneme class). The 
the steady-state segment is interpolated. This too is done by taking the phonemic conte: 
into consideration. 

4. Step 3 is repeated for all the variable parameters for the same given phoneme. 

5. Steps 3 and 4 are repeated for the first to the last phoneme. 

6. Ultimately, the pitch contour is superimposed on the utterance. 

There are many important deviations from this general framework, which are taken ca 
of independently. For example, as mentioned earlier, the allophonic variations of /h/ haA 
to be handled as exceptions. Also, the burst energy contour for a stop consonant is fair! 
complex. So, these are generated by explicit amplitude rules corresponding to the place < 
articulation of the stop. 

Figure 5 shows an example of the operation of the synthesis-by-rule module. The woi 
‘out’ in the form of the phoneme string ‘aauqt’ is given as input. The phoneme string 
parsed into its components ‘aa’, ‘uq’ and ‘t’ and their corresponding classes (vowel, vow 
and stop) are identified. The phoneme durations (which include the steady-state duration 
and the transitions are shown. Out of a set of about 20 parameters which are varied in tin 
by rules, the variations of four selected ones (av - the amplitude of voicing and FI, F 
F3 - the first three formants) are illustrated in the figure. The spectrogram of the synthet 
speech is shown at the bottom. 

Figure 6 shows the spectrograms of the word ‘Bambai’ spoken naturally (at the top) at 
synthesized by our system (at the bottom) for comparison. 


7. Performance 

Unfortunately, no clear standard for evaluating the performance of a synthesizer h 
evolved internationally, with the leading laboratories conducting performance evaluatio 
suiting their own requirements. For synthesis of Indian speech, any performance evaluate 
result is yet to be published. 

The performance of a synthesizer can be described in terms of intelligibility, quali 
and speed. As we have emphasized on intelligibility at the current stage, we evaluat 
it in somewhat rigorous manner with the ‘segmental intelligibility test’ as performed 1 
Carlson and Granstrom of KTH (Carlson et al 1990). For this, 102 VCV syllables we 
synthesized with three ‘extreme’ vowels /a/, /i/ and In/ and 34 consonants (includi: 
semi-vowels) which are acceptable to our synthesizer. (We left out /v/ because in Indi 
languages it is quite often pronounced as /w/.) The syllables were played in a randc 
sequence to 8 listeners who were not much exposed to synthetic speech earlier and w 
were given the opportunity to listen to each utterance only once. The listeners were ask 
to identify the consonants and the error rate in identifying them was 45.7%. 

To compare with the internationally reported results, the KTH synthesizer, under simi 
test conditions, had an initial error rate of about 42% which was improved to about U 
over several years (Carlson et al 1990). Under less stringent closed response (multi] 
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Figure 5. Synthesis-by-rule: an example. 
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Figure 6. Spectrograms of the word ‘Bambai’: natural speech (at the top) and syn¬ 
thetic speech (at the bottom). 


choice) tests, the error rate in identifying consonants varied from 27% (e.g. Type-n-l 
to 3% (e.g. DECtalk)(Klatt 1987). 


As mentioned earlier, there is no previously reported result on Hindi speech to coi 
wi . owever, it can be argued that the error rates are likely to be more for Hindi due 
presence of many more consonants (e.g. there are 16 stop consonants in Hindi as aj 
m English) and their greater confusabilities (e.g. between the retroflexed and < 
consonants and also between the aspirated and non-aspirated consonants). All consic 

initial™ rate ,° s y nt k es i zer is moderately high, but compares favourably wii 
mhal error rate of the KTH synthesizer under similar test conditions. 

T« 81116 CITOrS ’ S ° me are due t0 inexact implementation (e.g. the error rat 

test T h ! §h) im P rove “ were undertaken as per the perce 

test results. However, the closeness of some phoneme pairs [e.g. M and the retro! 
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flap (d5 in figure 1), If/ and the voiceless, aspirated labial stop (p2 in figure 1)] puts a 
practical limit on the reduction of error rate at syllable level. They are expected to be better 
perceived at a word or sentence context. 

Itis expected that, in general, the perception accuracy for nonsense syllables will be much 
less than the corresponding value for meaningful sentences generated by the synthesizer. 
To test this, we played out the synthesized address of an unfamiliar building and the 
error rate in identifying the phonemes in this ‘global acceptance test’ was 6.8%. This 
establishes our estimation that for any practical use, the intelligibility of the synthesizer is 
acceptable. 

8. Applications 

Speech synthesizers, especially in the form of comprehensive text-to-speech systems, have 
a wide range of potential applications. These include public announcements, voice-mode 
computer tutors (specifically, language tutors) and various aids for the speech handicapped 
and the blind. The last mentioned category of applications is described in some detail in 
Klatt’s (1987) review paper. 

However, one category of potential applications we want to particularly emphasize is the 
access to information over telephone from a computerized information base. With the cur¬ 
rent ‘information revolution’ and the explosion of computer and communication networks, 
the demand for automatic access to information of public utility (e.g. railway reservation 
status, flight schedules, latest stock market quotations) by dialling will rapidly increase. 
The real-time update of the information base will rule out pre-recorded messages or even 
manual response. The usage of fast and quality synthesizers will be the only solution. Even 
when the information base is not updated very fast, a text-to-speech synthesizer may be 
advisable because text occupies much less storage than the speech waveform. The access 
to the specific information needed can be done by a multi-level menu system and can 
be implemented by either touchtone buttons or by a small vocabulary voice recognition 
system. 

9. Conclusions 

This paper describes our source-filter model-based synthesizer against the backdrop of 
the current advances in the area of speech synthesis in Indian languages. At least one 
concatenation-based real time text-to-speech system is now available (Bhaskararao et al 
1994). Itis expected that the work reported here will clear the way for formant synthesizers, 
with better potential for quality and versatility, to appear in the scene. 

The major hurdle has been successfully overcome with the development of a comprehen¬ 
sive set of rules for synthesizing unlimited speech in some Indian language(s). A standard 
implementation of the source-filter model was found to be adequate as the basic tool for 
such synthesis. A formant-based text-to-speech synthesizer should be the next logical step. 

For that, incorporation of text-to-phoneme conversion is necessary. Initiating research 
and development in that direction is in our future agenda. Any readily available module 
can also be plugged in. 
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For meaningful applications, the synthesis should be done in real or near real time. This 
is a hardware engineering problem and an acceptable solution with the current state-of- 
the-art is possible. 

That a rule-driven formant synthesizer can generate quality speech in Indian languages 
has been amply demonstrated by our work. However, we feel that the full potential of the 
synthesizer is far from being utilized. Taking advantage of the flexible rule structure, quality 
is being improved continually. The work includes refinement of phoneme concatenation 
rules, and better incorporation of prosody and co-articulatory effects. 

The satisfactory synthesis of the female voice poses many additional problems in any 
language. Detailed study of the female voice is being done by us for better implementation. 

Taking advantage of the versatility of the synthesis techniques employed, extension of 
the synthesizer to other Indian languages is being contemplated. 


We gratefully acknowledge the support provided through the Knowledge Based Computer 
Systems (KBCS) project in carrying out this work. We thank the late Dr D H Klatt, 
Prof. K N Stevens and other members of the Speech Communications Group, MIT, USA 
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Morphological processing of Indian languages for lexical 
interaction with application to spelling error correction 
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Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, 203 
B T Road, Calcutta 700 035, India 
email: [sprobal, bbc]@isical.emet.in 

Abstract. An NLP system for Indian languages should have a lexical sub¬ 
system that is driven by a morphological analyzer. Such an analyzer should be 
able to parse a word into its constituent morphemes and obtain lexical projection 
of the word as a unification of the projections of the constituent morphemes. 
Lexical projections considered here are /- structures of the Lexical Functional 
Grammar (LFG). A formalism has been proposed, by which the lexicon writer 
may specify the lexicon in four levels. The specifications are compiled into a 
stored lexical knowledge base on one hand and a formulation of derivational 
morphology called Augmented Finite State Automata (AFSA) on the other 
to achieve a compact lexical representation. The aspects of AFSA, especially 
its power of morphological parsing of words in a computationally attractive 
manner, has been discussed. An additional utility of the AFSA, in the form of 
spelling error corrector, has also been discussed. Bangla, or Bengali is consid¬ 
ered as a case study. 

Implementation notes based on object-oriented programming principles has 
been provided. 

Keywords. Natural language processing; morphological sub-system; lexi¬ 
cal representation; augmented finite state automata; spelling corrector; object- 
oriented implementation. 


1. Introduction 

As most Indian languages are richly inflectional, a realistic natural language processing 
(NLP) system for any such language should have a morphological sub-system for pars¬ 
ing surface forms of words into its constituent morphemes. Such a sub-system achieves 
reduction in lexical redundancy and compactness of lexical representation. During the 
course of NLP research, the present authors have proposed a formalism for lexical speci¬ 
fication that leads to a compact lexical representation for an inflectional Indian language 
(Sengupta & Chaudhuri 1993). The formalism performs efficient parsing of words leading 
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Figure 1. Conventional lexical interaction. 


to lexical projection in the form required by a Lexical Functional Grammar (LFG) 
(Kaplan & Bresnan 1982). While conventional lexical analysis may be explained by 
figure 1, our proposed morphosyntactic analyzer performs lexical analysis as shown in 
figure 2. A detailed description of the formalism may be found in Sengupta (1994). 

The central idea of morphological parsing of words is tackled in our formalism by an 
Augmented Finite State Automata or AFSA that has been proposed by us. We have not 
only studied the formal aspects of the AFSA but have also attempted an object-oriented 
implementation of the same. The efficacy of the AFSA as a tool for dictionary storage with 
spelling error detection/correction has also been investigated. 

The importance of morphological processing is an established fact. Originally dealt with 
at reasonable depth by Kaplan & Bresnan (1982), in contemporary works, the two-level 
approach of Koskenniemi (1983) is most well known. The approach has been extensively 
reviewed (Gazder 1985) and used in implemented frameworks (Ritchie et al 1987). Our 
proposed recognition system is intermediate between the approaches of Kaplan et al and 
Koskenniemi. 

In § 2, we provide a general background of lexical analysis of Indian languages. Lexical 
specification and representation as suggested in our scheme are taken up in § 3 and § 4, 
respectively, the AFSA being introduced in the latter section. Spelling correction with the 
AFSA is discussed in § 5 and notes on Object Oriented implementation of the lexical 
analyzer, especially the AFSA are included in § 6. 



Figure 2. Lexical interaction with morpho-syntactic analysis. 
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2. General background 

The constituent morphemes of words in an Indian language are Stems which form the 
essential component of words providing the meaning and Affixes. Depending on position 
of conjoining, an affix is a prefix, an internal affix or a declension. We shall however keep 
prefixes out of the purview of the present paper. Words are produced as a result of con- 
joinings between a stem and an appropriate number of affixes (may be none), the last of 
which will be called a declension. Every constituent morpheme of a word contributes to 
its overall linguistic property. The morphemes may be partitioned into several classes with 
morpho-syntactic rules, that are essentially non-recursive (i.e. regular), restricting conjoin- 
ings among them. Another class of rules of generative morphology called spelling rules, 
are concerned with morpho-phonemic or morpho-graphemic restructuring of symbols at 
the boundaries of two conjoining morphemes. The spelling rules make the job of detection 
of morphemic boundaries more difficult. 

The basic idea of our proposed formalism is to unite the lexicon, the lexical description 
and the surface description into an integrated system. A few major aspects of the formalism 
are: 

• Stored lexical knowledge bases for morpheme classes, their inter-relationships and 
spelling rules. 

• A Lexical Specification phase in which the lexicon writer imparts linguistic knowledge 
to build up the above knowledge bases. 

• A Representation phase in which the following two levels of lexical representation are 
generated: 

- A Comprehensive Lexicon for every morpheme containing every relevant lexical 
knowledge. 

- A formulation of the rules of morpho-syntax and derivational morphology in the 
form of an Augmented Finite State Automata (AFSA), which has the capability of 
parsing words and detecting morpheme boundaries, even when spelling deformities 
are present. The AFSA has pointer leading into the Comprehensive Lexicon for 
information retrieval. 

The formalism is presented as a software tool to be used by a linguistic expert for specifi¬ 
cation of the lexicon of the target language. These specifications are to be ‘compiled’ into 
the proposed representation scheme. 

3. Lexical specification 

The linguistic expert would provide lexical description in four levels: 

1. Specification of different morpheme classes. 

2. Specification of rules of morpho-syntax. 

3. Specification of a set of spelling rules. 

4. Specification of a list of morphemes (constituting the vocabulary). 

Unless otherwise mentioned, the specification format is like in LISP lists. 
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3.1 Specification of morpheme classes 

The lexicon writer provides the names of different morpheme classes with the feature 
structures — a set of attribute names with a list of permitted attribute values (default 
in brackets, $ for strings). Directives STEM or END indicate that it is a stem class or 
declension class, respectively. 

An Example Specification for morpheme classes VSTEM, NSTEM, VDEC, VCAUS, 
NCASE and DEF (representing verbal stems, nominal stems, verbal declensions, verbal 
causational affix, nominal case declensions and nominal definiteness affix, respectively): 

(({VSTEM STEM END) ( (VALENCY ([0] 1 2 3)) 

(PRED ($))■ 

((NSTEM STEM END) ( (CAT ([material] abstract instrument place)) 

(ANIM (+ [-])) 

(PRED ($)))) 

((VDEC END) ( (TENSE (PAST [HABIT] PRESENT )) 

(GNPH (0 lp [2p-0h] 2p-lh 3p-lh 2/3p-2h )) 

) ) 

((VCAUS END) ( (CAUS (0 [1] 2)))) 

{(NCASE END) ( (CASE ([NOM] DAT LOC OBLQ POSS )))) 

((DEF END) ( (DEF ([YES] NO)))) 

) 

3.2 Specification of rules of morpho-syntax 

Rules of morpho-syntax (Word Grammar rules) govern formation of words from mor¬ 
phemes and lead to restrictions on a morpheme of one class following a morpheme of 
another in a word that has to be reflected in the AFSA. They are specified in the usual 
manner of specifying syntactic LFG rules, i.e. a lexical category is constituted of (denoted 
by —>) morphemes of some classes. The rules may be annotated with f-structure schema. 
An artificial morpheme class called NULL may be used as a mechanism to project lexical 
information when certain morphemes are missing. 

An example specification for verbs and nouns: 

1. VERB —> VSTEM NULL 

(f TENSE) = IMPER 
(f GNPH) = 2p — Oh 

2. VERB —> VSTEM VDEC 

3. VERB —■> VSTEM VCAUS VDEC 

4. NOUN NSTEM NULL 

(t CASE) = NOM 

5. NOUN —*• NSTEM NCASE 

6. NOUN —y NSTEM DEF NCASE 

7. NOUN —y VSTEM VCAUS NCASE 

(f CAT ) = gerund 

8. NOUN —► VSTEM VCAUS DEF NCASE 

(f CAT) = gerund 

3.3 Specification of spelling rules 

Morphological conjoining in Bangla, especially in the common dialect, fails to be strictly 
concatenative because there can be deformation of symbols around the boundary of 
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conjoining morphemes in many cases. For example, when the two morphemes dhu (verb 
stem “wash”) and ben (a verb declension) conjoin, the resultant ‘surface’ form is dhoben. 
Note that the deformation of u to o in the example is very near the conjoining boundary. 
We therefore assume that words may have two different levels of representation. The rep¬ 
resentation that we write, read, speak and hear is the surface level representation, while at 
a lexical level of representation, morphological conjoining is strictly concatenative. Thus 
the lexical form of the above example would be dhuben. 

Paul (1986) has closely studied spelling deformities in Bangla, especially of the verbal 
paradigm. From those observations, backed up by some of our own, we may list the 
following salient features of spelling deformity: 

• Any deformity may be characterized entirely by the ‘atomic’ operations addition or 
deletion of a symbol and replacement of one (or more) symbol(s) by one (or more) 
symbol(s). If 0 is used to denote the ‘absence’ of a symbol, all the above atomic 
operations may be expressed by the single operation - replacement. 

• The innumerable individual instances of deformities may be reasonably generalized 
with “global” (i.e., applicable during a conjoining between any two morphemes) or 
“local” (applicable between pairs of morphemes from certain particular morpheme 
class pairs) applicabilities. 

We now introduce the concept of spelling rules and specifications thereof. 

Alphabet: The set of all characters that can constitute a lexical form (resp. surface form) 
of a word constitutes the alphabet Ex (resp. Ej?). E i and E r are not not necessarily 
identical. 


An l-r pair: We define a : b, where a e X/_ U 0 and b e E« U 0, to be an l-r pair or 
simply pair. A union a\ \a 2 \. . • \ a k '■ h i j /;?. ■ ■ | bk of pairs represents a disjunctive choice 
a; : bi, 1 < i < k, from the k possible pairs. 


An R-expression (RE): This is defined as a finite string of unions of pairs. For example, 
RE = (a\b) : (x|x)0 : yc : 0 is an R-Expression. If there are n unions in an R-Expression 
RE, it represents all distinct 1-r-pairs of length n obtained by opening out the disjunctive 
choices of the unions. 


String matching: A string t is tail matched by a string sif|f|>|s|=n, and the last n 
symbols of t spell out the string s. A string s head matches a string t if s is a prefix of t. 


Spelling rule: A spelling rule is a template of the form RE 1 + RE 2 , where R E 1 and R E 2 
are R-expressions. The character + represents the abstract morpheme boundary. Intuitively, 
a rule RE\ 4- RE 2 means the following. At the boundary between two morphemes, let the 
left morpheme tail match and right morpheme head match RE\ and RE%, respectively. 
In the surface, the matched portion of the morphemes get translated to the corresponding 
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symbols from REf and REf- In other words, n feasible pair of morphemes 1 ml + m2 
matches a rule RE\ + REi and gets translated to surface s 1 1\t 2 S 2 if and only if mi = s i r\, 
m 2 = r 2 S 2 , r\ e REf, t\ € REf, r 2 6 REf, tz € REf and r\,t\ and r 2 ,tz are 
corresponding pairs. 

Often, as a shorthand, a spelling rule template may be written as: 

...S :T 

where S and T are strings of identical size. This representation is a shorthand for | S | 
different rules formed by taking members from S and T in order. 

Local and global rules: Rules may be defined to be applicable at the boundary between 
only some specified pairs of morpheme classes. In such cases, the rules are adorned by 
the pairs of morpheme class. Such a rule will be called a local rule. All unadorned rules, 
called global rules, are applicable at the boundaries of all morphemic pairs. In the event 
of a rule clash between a local and a global rule, the local rule prevails. The specification 
format of spelling rules is: 

<Spelling Rule Template> [ at ( M\, M 2 ) ] 

where, Mi and M 2 are morpheme classes. The “at” clause, if present, indicates a local 
rule. ‘0’ is the null symbol and symbols *V\ ‘C’ and = represent the set of all vowels, set 
of all consonants and the entire alphabet set, respectively. Consider the following spelling 
rules as examples. 

1. V: V + a: o 

2. eiuo:eiuo + 0:y e:e 

3. V:V + ieu:000 nsk:nsk at VSTEM, VDEC 

4. C:Ca:eC:C+E:eat VSTEM, VDEC 

5. a' : i + i: e y: 0 a' : 0 at VCAUS, VDEC 

6. a' : a' + 0: c 0 : / ch: ch at VCAUS, VDEC 

The first two of the rales are global. For a more comprehensive list of spelling rules for 
Bangla, see (Sengupta & Chaudhuri 1993; Sengupta 1994). 

3.4 Morpheme list specification 

The final level of lexical specification involves providing a list of morphemes. 

Example: 

i) pa': ( (VSTEM (VALENCY 1) (PRED 'get')) 

(NSTEM (CAT instrument) (PRED 'foot'))) 

ii) pa't: ( (VSTEM (VALENCY 1) (PRED 'lay'))) 


We say that a pair of morpheme classes (Mj, M 2 ) is a feasible pair, if any morpheme of class M 2 can follow any 
morpheme of class M\ in a word. 
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iii) mar: ( (VSTEM (PRED 'die') (CAUS 0))) 

iv) ma'r: ( (VSTEM (VALENCY 1) (PRED 'kill 7 ))) 

v) a': ( (VCAUS)) 


vi) t'a': ( (DEF)) 

vii) er: ( (NCASE (CASE POSS))) 

viii) ke: ( (NCASE (CASE DAT))) 


ix) e: ( (NCASE (CASE LOC) (CAT place)) 

(NCASE (CASE OBLQ) (CAT material)) 
(VDEC) ) 


x) E: ( (VDEC (TENSE CONTINF) (GNPH 0))) 

xi) te: ( (VDEC (GNPH 2p-lh)) 

(VDEC .(TENSE CONDINF) (GNPH 0)) 
(NCASE (CASE LOC) (CAT place)) 

(NCASE (CASE OBLQ) (CAT material)) ) 

xii) ten: ( (VDEC (TENSE HABIT) (GNPH 2/3p-2h))) 


Observe the following in the above specification: 


1. A morpheme may belong to multiple classes as in i), ix), xi). 

2. There may be alternations in the lexical specification as in ix) xi). 

3. A morpheme may be specified only with its class as in v), vi), ix). 

4. There is no harm in re-specifying a default attribute value as in xii). 

5. An attribute not in set of default attributes of a word may also be specified, as in iii). . 


4. Lexical representation 

There are two levels of lexical representation: 

• A Comprehensive Lexicon for every morpheme. 

• An Augmented Finite State Automata (AFSA). 

4.1 The comprehensive lexicon 

The comprehensive lexicon is basically an indexed database of morphemes. The specifi¬ 
cation of an individual morpheme is first completed by copying the default specifications 
from the class it belongs. From this, tuples of the type: 

< Morpheme Class > (< Attribute Name/Path >< Attribute Value >)(...) 

are created and stored in the database entry for the morpheme. There may be multiple 
tuples for the same morpheme. 


4.2 Introduction to the AFSA 


The Augmented Finite State Automata (AFSA) is our proposed tool for parsing words 
into constituent morphemes. In the normal mode of use, the AFSA accepts the surface 
representation of a word as input and ultimately generates index pointers into the com¬ 
prehensive lexicon for the morphemes constituting the word. The lexical projection of the 
word can be recovered as a result of unification of the lexical projections of the constituent 
morphemes. We have assumed that the application of a rule at a boundary is context-free , 
neither affecting nor being affected by an earlier or later application of a rule at some other 
boundary. 

The AFSA consists of a forest of Directed Acyclic Graphs (DAGs). Each DAG represents 
a finite state recognizer for a class of morphemes. However, there is a single DAG for all 
STEM type morphemes. The DAGs consist of two types of edges — lexical or Z-edge and 
surface or s-edge. Transition along l-edges only from the root node to a terminal node 
of a DAG recognizes a lexical morpheme. Transition along s-edges however, recognize 
one surface form of some lexical morpheme. The different DAGs in the system are also 
interconnected by 1- and s-edges. However, the inter-DAG edges are qualitatively different 
from intra-DAG edges. We call inter-DAG edges active and intra-DAG edges passive. The 
active edges encode the morpho-syntactic restrictions applicable for the language specified 
by the lexicon writer as described in § 3.2. 

4.3 Formal definition of the AFSA 

The AFSA consists of a forest of DAGs, where every DAG consists of: 

a. A set of nodes representing states. Nodes are labelled as passive , l-active and/or s-active. 

b. A set of l-passive edges between a pair of nodes in the same DAG. 

c. A set of s-passive edges between a pair of nodes in the same DAG. 

d. A set of l-active edges linking a (terminal) node of one DAG to the root node of another 
DAG. The word active node will be used interchangeably with terminal node. 

e. A set of s-active edges linking a terminal node of one DAG to a root node of another 
DAG. Every s-active edge has a disjunction of one or more index pointers into the com¬ 
prehensive lexicon. Every s-active node is associated with one or more s-active edges. 
Additionally, if the DAG to which an s-active node belongs represents a morpheme class 
with the END directive, the node has one trivial s-active edge. Unlike normal s-active 
edges, a trivial s-active edge does not link to any node in a different DAG. However, it 
has an index pointer into the Comprehensive Lexicon. 

f. A set of associations connecting an l-active and one or more s-active nodes. 

Every DAG has a unique root l-node and one or more root s-nodes. The root 1-node is also 
a root s-nodes. 

4.4 Parsing in the AFSA 

Input: The surface representation of a word — s 






Aim: To recover pointers into the Comprehensive Lexicon of the constituent morphemes 
of s. 

Data structures: The AFSA and a Stack of quadruples (dag, node, index, k), where 
node is an active node in DAG dag, index is an index pointer into the Comprehensive 
Lexicon and latest morphemic boundary is at the k — 1-th symbol of the input. 

Driving routine of the AFSA: 

Step-1 dag <— STEM ; node <— root 1-node of the STEM DAG; k +- 1. (Here k points 
to the character in s currently being scanned). Clear the Stack. 

Step-2 If the end of string has not been reached, proceed to Step-3. Otherwise, check if 
node is an s-active node. If not, proceed to Step-5. Otherwise, let p *— pt, where p t is 
the trivial index pointer for node. Push (dag, node, p, k ) in stack and exit with success. 

Step-3 If node is an s-active node, non-deterministically decide whether to make an active 
transition. If yes, let the chosen non-trivial s-active edge lead to node n in DAG d and let 
p be the index pointer of the chosen edge. Push (d, n, p,k) onto stack. Make dag •«- d 
and node n and repeat step 3. If active transition is not taken, proceed to Step-4. 

Step-4 If there is an s-passive edge in DAG dag from node node to node n on the k- th 
character of s, make node +- n;k <- k + 1 and go to Step-2. 

Step-5 If there is no s-active or s-passive transition possible in DAG dag from node node 
based on the k- th character of s, or if k points beyond the last character of s, pop 
(dag, node, p, k) from stack (where p is a dummy variable) and resume in Step-2. If 
the pop operation fails, exit with failure, i.e. declare the input word to be ill-formed. 

Output pointers to comprehensive lexicon: If the driving routine terminates success¬ 
fully, the stack contains r, r > 1 quadruples (d\,n\, p\, k\), (efc, « 2 » Pi, kf), ■ ■ ■ (d r , n r , 
p r ,k r ). The lexical projection of the word is the union of the r sets of schema obtained 
from the comprehensive lexicon by following the pointers p\, p 2 ,..., p r - 

4.5 Automatic generation of AFSA 

The specifications provided by the lexicon writer are compiled into an AFSA. The com¬ 
pilation proceeds with two passes over the list of morphemes, along with an intermediate 
pass over the list of spelling rules. The compilation process is pre-processed by a pass over 
the list of morpheme classes. The second pass also consults the set of spelling rules. The 
compilation process has been discussed in detail in Sengupta (1994). The AFSA obtained 
after the first pass of compilation has been shown in figure 3 and the final AFSA produced 
is shown in figure 4. 

4.6 Complexity and related issues 

The worst case time complexity of parsing in AFSA is of exponential order, primarily due 
to the non-determinisms at various stages. Let there be n DAGs in the system corresponding 


372 


P Sengupta and B B Chaudhuri 


STEM 


VDEC 



NCASE 



VCAUS 

DEF 


o© 

-oo© 


poss 


Edges are both 1-passive and s-passive. Numbered circles are active nodes. Figures on terminal 
nodes indicate entry number of Morpheme Lexicon indexed by the nodes. 

Figure 3. The AFSA after first pass of compilation. 


to n -1 non-STEM morpheme classes and one common STEM DAG. The worst case first 
level non-determinism occurs at an active node (which is also a passive node) of the STEM 
DAG where the next symbol has active transitions to every n - 1 non-STEM DAG as well 
as a passive transition. This gives rise to an n level non-determinacy. The worst case second 
level non-determinism is of order n - 1 and may occur at an active node of a non-STEM 




Morphological processor for Indian languages 


373 


DAG, where there may ben- 2 active transitions to the remaining n — 2 non-STEM DAG 
(active transitions from a DAG to itself are not possible) as well as a passive transition. 
Similarly, &-th level non-determinism is of order n - k + 1. Of course, while parsing a 
given word, there may be at most n levels of non-determinism, since otherwise there must 
be a cyclic active transition. As all non-determinisms are multiplicative, the worst case 
scenario during parsing could lead to n ! non-deterministic choice for every single symbol 
of the word, hence giving a 0(k n ') worst case complexity for a word of length k. 

We have observed that while there may be many active transitions from an active node 
of the STEM DAG, active transitions from an active node of a non-STEM DAG is fewer. 
This is because, a non-STEM morpheme class may be followed by only a few other DAGs 
in morpho-syntax rules, allowing us to conclude that only first level non-determinism af¬ 
fects computational complexity appreciably. Pragmatic worst case complexity is therefore 
0(k n ) — still exponential! 



VDEC 





/ 

/ 






Figure 4. Continued on next page. 
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POSS -CeCEQ 


\ 



* 1-active Node + s-active Node 1- Edge -*►»- Edge 

Figure 4. The final compiled AFSA. 

Focusing entirely on first level (i.e. localized in the STEM DAG) non-determinism only, 
it is possible to reduce it further by associating a set of lookahead symbols with active 
edges. An active transition is taken only if the next symbol belongs to the lookahead set. 
A non-determinism is then encountered when all the following conditions hold: 

« The present node is an active node. 

• Total transitions (both active as well as passive) possible from the present node, with 
the next symbol as “lookahead”, is more than one. 

While worst case complexity still remains exponential, since non-determinism is now 
governed by two independent events of moderate probability, the practical complexity 
is quite low. It is difficult to analytically compute the average case complexity since it 
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is not easy to estimate the distribution of words being fed to the AFSA. We made the 
following short study for a moderate lexicon consisting of 565 stems of different classes. 
We identified the nodes in the STEM DAG that cause non-determinism, along with the 
offending characters. 

The results are shown below: 


Total number of stems 

= 565 

Total number of nodes 

= 764 

Total number of active nodes 

= 551 

Total number of offending active nodes 

= 124 


Of the 124 offending nodes (less than 17% of all nodes), in as high as 84, the offending 
symbol was i. Invariably, these nodes recognized verbal stems and the non-determinism 
was between active transitions to the VCAUS (since by the effect of some spelling rules, 
the causational affix i becomes iy, if the following verbal declension is E realized as e in the 
surface) and VDEC (since there are many declensions like i, ite, ila, etc., that begin with 
i. We fine-tuned our recognizer to lookahead to the second lookahead character at active 
nodes, if the next symbol is i. This leaves us with 60 isolated offending active nodes in the 
AFSA. The test runs show near linear (with respect to word length) run time complexity. 

5. Use of AFSA in spelling correction 

A spelling corrector is a computer program that takes a presumably misspelt word as input 
and suggests possible correctly spelt alternative words for the same. The suggested words 
are in a way at close Hamming vicinity of the input word. In general, an erroneous word 
may have been generated from a correct word due to one or more of the following: 

• A character symbol at some position of the correct word has been substituted by some 
other character. This is the most general type of error, generally known as substitution 
error. 

• A character symbol at some position of the correct word has been dropped inadvertently. 
This error is known as a deletion error. 

• A spurious character has been inserted at some position of the correct word resulting 
in an insertion error. 

If the null character, whose external manifestation in a string is absence of any symbol 
at its position, is considered to belong to the logical character set, the deletion and insertion 
error can be considered to be special cases of the substitution error. In deletion error, the 
deleted character is substituted by the null character while in insertion error, a null character 
is substituted by a non-null character. 

Computing the precise Hamming vicinity of a word is very difficult. The accepted 
spelling correction techniques assume that errors accumulate with decreasing order of 
probability. Usually, as in our implementation, the probability of a single error (of either 
of the three types) is assumed to be 0.5. At every subsequent error, the probability hitherto 
encountered is halved. In this way, there is an informal relationship between the Hamming 
distance between a correct word and a (possibly) erroneous one to the probability of the 
latter. The lower the probability, the greater is the Hamming distance. 
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On principle, our spelling corrector proceeds as follows. Given an input (possibly er¬ 
roneous) word, it goes about recognizing the same in the AFSA. During the process, it 
continuously builds up a list of possible words along with their probabilities as well as 
proper prefixes of such words and their probabilities. In every elementary step, yet another 
prefix is selected from the latter list proceeded with one more character (may be null ) at 
the end. As a result of an elementary operation, more prefixes may be added to the list. 
However, if the chosen prefix was found to be of a probability below a threshold, i.e., it is 
already quite far away from any valid word, it may be discarded. If the newly generated 
prefix represents a full word, it is also appended to the list of possible words. 

The AFSA can be easily used as the building block of a spelling corrector. Only the 
s-edges of the AFSA need be considered by a spelling corrector. In our implementation, 
described in more detail in Das (1994), the following auxiliary data structures are used: 

• The WordNode structure is a 4-tuple (str,pos,prob,node), where str is a string of char¬ 
acters representing a word-prefix. Recall that every valid word-prefix may be associated 
with one or more nodes of the AFSA, the multiplicity being due to the non-determinism 
introduced by the active nodes/edges. In the above tuple, node is one such node, pos is 
an integer denoting the position of the input word with which the prefix str is associated 
and prob is the hitherto computed probability of str. 

• The Possible Word List (PWL) is a list of possible suggestions in case the input word 
is erroneous. 

• The Search List SL is a list of prefixes to be considered, actually stored as a list of 
WordNodes. 

In an elementary operation, a WordNode w n is taken out from SL and processed upon. 
During this processing, w n .node becomes the current node. Depending upon the situation, 
three types of search moves may be necessary: 

• If w n .node is a passive node, the move PassiveSearch carries out a one character 
lookahead making transitions along passive edges. 

• If on the other hand w n .node is an active node, the move Act iveSearch carries out 
the lookahead from the root node to the DAGs to which there are active edges from 
w n .node. 

• It may happen that w n .pos indicates that scanning has proceeded beyond the last char¬ 
acter of the input word. In such a case, the look ahead proceeds from w n .node till a 
terminal node is reached, i.e., a valid word is detected. This is taken care of by the move 

Forwardsearch. 

The overall spelling corrector algorithm begins with an inputWord coci .. i, 
empty PWL and SL containing the only WordNode w n = ("", 0,1.0, n), where, n is the 
root node of the STEM DAG. Note that the initial probability w n .prob is 1.0. 

The main loop of the spelling corrector is as follows: 

while (SL is not empty) {• 

wn = Some item taken out of SL; 
if (wn.node is terminal && 

0.5**(size-wn.pos)>threshold) { /* ** is to-the-power */ 
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/* Word recognized at the node is close enough */ 
add wn.str to PWL; 

} 

if (wn.prob>threshold) { /* Proceed only if probability is sufficient */ 
if (wn.node is active) ActiveSearch(wn); 
if (wn.pos<size) { /* Still possible to associate */ 

PassiveSearch(wn); 

} 

else { 

ForwardSearch(wn); 

} 

} 

throw away wn; 


It is appropriate to describe at least the PassiveSearch module here. 

void PassiveSearch(WordNode wn) 

{ 

if (found (nl=passive transition on inputWord(wn.pos] from wn.node)) { 

/* Normal proceed. Do not alter probability */ 

add WordNode(wn.strI IinputWord[wn.pos],wn.pos+l,wn.prob,nl) 
to SL; 

} 

if (wn.pos<size-l && 

found (nl=passive transition on inputWord[wn.pos+1] from wn.node)) { 

/* Possibility of detection of an Insertion error. 

The insert character is inputWord[pos]. 

This is notionally deleted. */ 

add WordNode(wn.strlIinputWord[wn.pos+1],wn.pos+2,wn.prob/2 # nl) 
to SL; 

} 

k = number of passive edges from wn.node 
for (i=0;i<k-l;i++) { /* for all edges do */ 

nl = node reached following i-th passive edge from wn.node; 

c = character on the i-th passive edge; 

create wnl = WordNode(wn.strIlc,wn.pos,wn.prob,nl); 

/* wnl shall take care of Deletion and/or Substitution error. 

Consolidation done below. */ 
if (nl is active node) ActiveSearch(wnl); 

/* Taking active closure of nl before handling itself */ 

if (found (n2=passive transition on inputWord[wn.pos] from nl) ) { 

/* Taking care of Deletion of c at wn.pos-th position */ 

add WordNode(wn.strI|c|IinputWord[wn.pos],wn.pos+1,wn.prob/2,n2) 
to SL; 

} 

if (wn.pos<size-l && 

found (n2=passive transition on inputWord[wn.pos+1] from nl) ) { 

/* Taking care of Substitution of c by inputWord[wn.pos+1] 
at wn.pos-th position */ 

add WordNode(wn.strI icI IinputWord[wn.pos],wn.pos+2,wn.prob/2,n2) 
to SL; 

} 

} /* end for */ 

} /* end function Passive Search */ 

Consider as an example, the input pa’re and the tiny AFSA as shown in the figure. 
(Although pa’re is a valid Bangla word, the AFSA of the figure does not recognize it.) 
With a threshold of 0.125 (i.e. total error not exceeding two), our spelling corrector would 
give suggestions mere, ma’re, mare, ma’ra’, mara’, pa’te, pete, pa’ten, pa’y, pa’ke. 




6. Implementation notes 

The major object (also known as classes in C++ parlance) used are: 

Passive Node: This class represents a passive node of the AFSA. It contains lexical 
and surface back passive edges to the previous node. There is a container of pairs (c, Ip), 
where c is a symbol from the lexical alphabet and Ip is a pointer to another node. Thus the 
items of this container are 1-passive edges. There is another similar container for s-passive 
edges. Member functions includes forward/ reverse lexical/ surface transition procedures. 

Active Node: Is inherited from PassiveNode. There are two additional containers for 1- 
and s-active edges. Each contained item of these containers is an Active Edge. There are 
member functions for making 1- and s-active transitions. Another important member data 
of ActiveNode is pLexList, which is a pointer to a linked list of pointers to the Lexicon 
class. All members of the list are assumed to be disjunctively pointing to different lexical 
entries. 

Active Edge: Is a pair ( pLook, pNode) where, pLook is a pointer to a set of “lookahead” 
symbols and pNode is the root of the DAG to where the active transition is taken. 

Lexicon: This class does not have any important member data. It serves the purpose 
of interacting with the lexicon through database management routines to draw lexical 
projections of morphemes. 

Other classes used are FStructure, Pair, etc., along with member functions 
to perform Locate and Merge operations of LFG. These classes are discussed in more 
detail in Sengupta (1994). 


Object returned by the lexical sub-system: As shown in figure 2, there are two major 
contents in the object returned by the lexical sub-system — Word category of the input 
word being parsed and the f-structure. In practice, the structure of the object returned by 
the lexical component is not exactly so. The reason is that in our proposed system, there is 
an intervening supra-lexical layer (Sengupta & Chaudhuri 1996; Sengupta 1994) between 
the syntactic component and the lexical component. The components of the object actually 
returned is as follows: 

• The input word itself. 

• Word Category. Indicating the tentative word category of the input word. 

• A list of constituent morphemes. 

• A list of (codes for) morpheme classes Mi, M 2 , ■ ■ ■ for the constituent morphemes of 
the word. Thus, the details of morpho-syntactic composition of the input word is also 
returned. 

• A list of (pointers to) f-structures F \, F 2 , • • • for the constituent morphemes. 
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• A (pointer to) the unified f-structure of the word. 

® A (pointer to) the semantic clause of the word. 

Interaction with the lexical sub-system is carried out through the following function: 
int primitiveLexAnalysis( char *word, 

\\ Input word 
LexPrimitive *lexPrim 

\\ Returned object 
) ; // Returns TRUE 

// if parse successful, 

// FALSE otherwise 

7. Discussions 

The formalism proposed here has been tried out for a medium sized lexicon in Bangla 
consisting of about three hundred verb stems, one thousand nominal stems and a few 
stems of some other classes. There are about forty verb declensions and about ten case 
declensions capable of generating 2,400 VERBs (causated and non-causated) and 10,000 
NOUNs. The results of parsing obtained were very satisfactory, with near linear recognition 
time complexity. 

The lexical projection of a compound stem produced as a result of euphony of two stems 
can be derived from the conjoining stems. As a result, euphony is a more attractive subject 
of study than prefixes. If an AFSA is used to perform rule-based de-euphonization, it may 
have too many self-loops, resulting in reduction of efficiency. However, at the level of 
sentential syntax analysis, considerable advantages may be derived from rule-based de- 
euphonization. We have made some initial studies (Panda 1992). However, it is too early 
to report any major achievement. 

The biggest advantage of our formalism is the compactness and lucidity of represen¬ 
tation. The recognizers are finite-state networks — a well-studied formalism. The repre¬ 
sentation scheme is easy to understand and quite flexible. The underlying LFG formalism 
permits a broad generalization across morpheme boundaries as in the last example taken up 
in § 4.5 (i.e. mara’y). Comparing our formalism with Koskenniemi’s two level approach, 
we find that the latter does not incorporate morpho-syntactic restrictions in the automata 
itself. 

Finally, the potential power of the AFSA as the building block of a spelling corrector 
has been amply demonstrated. 
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Abstract. In this paper we define nondeterministic decision tables to describe 
process control rules specified imprecisely. An example of such a control rule 
is “if temperature is high and pressure is low then open valve slightly'”. The 
definition of nondeterministic decision tables is based on fuzzy sets and as¬ 
sociated logic. We show how nondeterministic decision tables are interpreted 
and specified actions executed based on measured values of independent control 
variables. When nondeterministic decision tables are formulated based on rules 
given by experts it is necessary to determine whether they have any redundant 
rules, missing rules or contradictory rules. We define these terms for nondeter¬ 
ministic decision tables and show how such logical errors can be detected in 
certain cases. 

Keywords. Decision tables; fuzzy logic; logical errors; process control rules. 


1. Introduction 

Decision tables (Rajaraman 1987, 1991) have been used extensively to specify complex 
logical procedures. The conditions specified in decision tables (DTs) are boolean, i.e. they 
can either take a ‘yes’ or a ‘no’ answer. Such DTs cannot handle situations where condi¬ 
tions are not precisely specifiable. A decision rule: “If temperature is high and pressure is 
low then open valve slightly”, for instance, is intractable in boolean logic. Very often expe¬ 
rienced operators of plants inuntiate decision rules in such loose nondeterministic terms. 
In this paper we define Nondeterministic Decision Tables (NDT) to reflect imprecisely 
defined decision rules. The definition is based on fuzzy sets and associated logic (Zadeh 
1965; Kaufmann 1975). When a DT is formulated based on rules given by human experts 
it is necessary to determine whether the rules are complete, specify whether there are any 
redundant rules or any rules that are contradictory. These concepts are well defined for DTs 
based on boolean logic (Rajaraman 1987). In this paper these terms have been defined for 
an NDT. The ideas suggested here are of particular relevance in defining DTs for process 
control. 
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Decision table formulation based on fuzzy sets have been proposed earlier by Francioni 
& Kandel (1988). Fuzzy decision tables (FDT) as proposed by them are not suitable for 
specifying nondeterministic rules as in their formulation 

• the input space is segmented into blocks via fuzzy variables defined on it. Inputs falling 
in a block cannot be differentiated and actions change abruptly across boundaries of 
blocks. Human judgement in general and process control in particular do not undergo 
abrupt changes - one finds a gradual phasing out of one action into another. 

• Very often one finds the following phenomenon in a human-controlled process: 

1. The action itself is fuzzy, i.e. the ‘strength’ of an action depends on the ‘strength’ 
with which a particular input satisfies a particular decision rule. 

2. The action is the weighted mean of actions entailed by control rules satisfied by 
the inputs. 

These aspects of process control cannot be supported by the FDT model suggested 
earlier (Francioni & Kandel 1988). An added advantage of the model suggested in this 
paper is that it can be reduced to FDT (Francioni & Kandel 1988) merely by putting a 
threshold by a process described later. Further the concept of incompleteness, redundancy 
and contradiction in fuzzy decision tables have not been analyzed earlier in the literature. 

This paper is organized as follows. In the next section we describe how nondeterministic 
decision tables are defined and interpreted. In § 3 we define incomplete, redundant and 
contradictory specifications in relation to NDTs and how such errors are detected. The last 
section states the conclusions of the paper. 

2. Nondeterministic decision tables 

A DT, be it crisp or fuzzy, defines a logical procedure by means of a set of conditions 
and their related actions. A DT is divided into four quadrants. All conditions relevant to 
the procedure are listed in the condition stub (sometimes condition listing carries over to 
the ‘condition entry’ column). Such DTs are called extended entry DT (EEDT). NDTs 
are almost always EEDTs. All actions relevant to the procedure are listed in the action 
stub. The next step is to determine which conditions taken together should lead to which 
action(s) and to record them on the right hand side as a series of decision rules. 

Example 1. Consider the DT of figure 1. The control process here monitors the opening 
of a valve contingent on certain measurements. 

An interesting observation is that this NDT has one boolean condition as well. But this 
should not be surprising as fuzzy logic being an extension of boolean logic should work 
in that restricted domain as well. 

Let us take one judgemental mle from the table and consider the problem of giving a 
proper interpretation to the fuzzy terms involved and to interpret the rules using a suitable 
logic. 

If temperature is high and pressure is low and valve ‘B’ is closed, then open valve ‘A’ 
slightly. 
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Condition Stub Condition Entries 



Figure 1. A decision table for controlling a valve. 


The fuzzy linguistic qualifiers involved here are high, low and slightly. Before we can 
attempt to give an interpretation to these qualitative terms we must decide the universe for 
each term, i.e. the range of the variable that these linguistic terms qualify. In this case the 
variables are temperature, pressure and extent of valve opening. Let us, for illustration, 
choose arbitrary ranges for these variables. 

Range of Temperature T is [150, 1000]°C. 

Range of Pressure P is [1,10] atm. 

Extent of Valve A opening is in range [0,1] 

Extent of Valve B opening is [0,1]. (For simplicity valve B assumed to have boolean 
states.) 

On these universes the linguistic terms used can be defined using fuzzy sets (Zadeh 
1965). These fuzzy definitions are flexible and user defined e.g. 

(T is high) is a fuzzy set Th whose membership function is 

/rm , 2 /r-iooo\ 

tnrh(T) = 1 + — arctan I--- j , 

where b=212.5. This choice of b gives mph (787.5) = 0.5 
The arctan function has a property that it rises very fast from 0 to 0.5 but the rise from 
0.5 to 1 is much slower. This fact is exploited to model the fact that humans are able to 
differentiate the degree of ‘hotness’ better towards the hottest end of a temperature range 
as compared to the temperate zone. Correspondingly the membership grade of T in Th 
changes rapidly towards the higher end and is rather slowly varying in the temperate zone. 
(P is low) is a fuzzy set PI given by: 

mpi(P ) = 1-arctan I——— j d = 1.5 

This choice of d gives mpi (2.5) = 0.5. 

(Open valve slightly) is a fuzzy set Os given by 

, x . /y -0.25\ 2 

mosiy) = exp —0.5 ^—--J 
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An exponential membership function is chosen for the fuzzy set open valve “slightly” 
to ensure that it is symmetric about y = 0.25. The parameter g provides a good control 
over the width of the set. 

Using these definitions of the linguistic qualifiers one can cast the nondeterministic 
decision rule into the following fuzzy logic function (Zadeh 1965; Kaufmann 1975). 

C TeTh) AND ( PePl ) AND (ValveB closed ) => (y in Os) 

where AND is the fuzzy conjunction operator and is the fuzzy implication operator. 
Various definitions of the same are possible. The definition used by us is 

(TeTh) AND (PePl) = min[m T h(T),m P i(P)], 

(PePl) =>• ( yeOs ) = min[m P i(P), mo s (y)]- 

Thus this rule gives an output fuzzy set C given by the membership function 

m c (y) = min[mos(y), mrhiT), m P i(P)]. (1) 

If the action corresponding to this rule had been a crisp singleton set then the output set 
C is a fuzzy set containing a single element with membership grade or ‘confidence value’ 
given by (1). 

With this analysis one can address the problem of giving a meaningful interpretation to 
the NDT as a whole. For that we need some more definitions. 

Tl : m T i(T) = 1 - (2/nr) arctan (T - 150/150) 

T (normal): T„ : m Tn {T) = exp[-0.5(T - 575/362.5) 2 ] 

P(high) : Ph : m Ph (P) = 1 + (2/n) arctan (P - 10/2.25) 

P(normal) : P„ : m Pn (P ) = exp[-0.5(P - 5.5/0.75) 2 ] 

The algorithm for analysing the NDT is as follows: 


1. Generate a matrix H of dimension p x n where 
p = number of entries in the condition stub 
n — number of decision rules 

hij = 1 if the (i, j) entry of the DT is a ‘don’t care’ entry 
and hij - 0 if the (i, j) entry is not a ‘don’t care’ 

For figure 1 the H matrix is given as: 




' 0 
0 
0 


0 0 
0 0 
1 0 


0 0 ' 
0 0 
0 0 


2. For a given input data (in our case temperature, pressure and condition of valve B) 
evaluate the decision rales and fill the H matrix as follows: 

hij = mAij (x) where Aij is the fuzzy set corresponding to the i th condition in the j th 
decision rule and x is the input corresponding to Aij 
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for example, 
h\ \ =m A n(x) 

= m Tn(T). 

Therefore, 

A\\ — Tn and* = T. 

Let T = 200°C; P = 4 atm and Valve B = closed; then the H matrix is given as 

" 0.58 0.17 0.17 0.80 0.80 " 

H= 0.13 0.23 0.30 0.23 0.30 . 

_ 1.0 1.0 1.0 1.0 1.0 

3. AND the entries in each of the columns. This conjunction is the fuzzy conjunction 
relation. This column operation leads to a 1 x n matrix which is the output fuzzy set 
C. If the actions are themselves fuzzy then C / is a fuzzy set of fuzzy subsets. 

C' = [ 0.13 0.17 0.17 0.23 0.30 ], 

After evaluating C' the action that has the maximum membership grade in C' is per¬ 
formed. If we assign discrete values to the valve opening in [0,1] then that action will be 
performed. We define “do not open” as 0, open slightly as 0.25, open halfway as 0.5 and 
open fully as 1.0. In the current problem rule 5 is satisfied as its membership function is 
maximum and the valve is not opened. 

Consider the following situation. The input condition is: 

T — 450°C, P = 9.5 atm, Valve B closed. 

The H matrix in this case is 

’ 0.94 0.23 0.23 0.29 0.29 ‘ 

H= 0.0 0.86 0.11 0.86 0.11 , 

.1 1 1 1 1 

C' = [0.0 0.23 0.11 0.29 0.11 ]. 

C 4 is maximum. Thus rule 4 is satisfied and the controller opens valve halfway. Had 
this been a boolean DT it would have adjudged T as normal 

(m Tn (T) = 0.94 > 0.5; m Th (T), m n (T) < 0.5), 

and P as high and invoked the ‘else’ rule and done nothing. The fuzzy controller on the 
other hand takes a weighted judgement taking all parameters in parallel. Although the 
temperature is lower than normal it decides that the exceptionally high value of pres¬ 
sure demands immediate action. This brings out a strong case for judgemental control of 
processes. 

We will now examine a slightly more complicated example. 

Example 2. The NDT figure 2 controls a system of valves that maintain the level of a 
reservoir. The inlet valve is uncontrollable. The outlet valves are A, B, C which have three 
possible states. Normal (0.5), Increased ( 1 .0), Decreased (0.0). It is also desirable that flow 
be maintained as high as possible in the priority A > B > C. 
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Inc.: Increase, Nor.: Normal, Dec.: Decrease 


Figure 2. Reservoir level control. 


The linguistic terms positive (pos.) and negative (neg.) are given by fuzzy sets defined 
on the real axis. 


tripos 00 — 


0 x < — 1 

0.5+ (2/jr) arctan (x) |x| < 1 

1 x > 1 

rv\ _ i ™ /v\ 


The universes of the various variables involved can be suitably normalized to conform to 
this universal definition. This method becomes simpler if the same linguistic terms qualify 
all variables as against defining separate sets for all variables. 

Let us prescribe some arbitrary specification. 


Minimum specified flow / = 

Normalized flow in the interval (: ci) 

Cumulative flow over last three intervals 
Normalized cumulative flow over last 3 intervals (: C 3 ) 
Specified height of reservoir = 

Normalized height (: C 4 ) 


30 

/ = (/- 30)/20 

fcum 

fcum — (fcum ~ 90)/40 
5 

y — (y — 5)/2 • 


At a particular sampling instant let the input measured values be: 


f = 21 
f = -0.15 


fcum —11 

fcum = -0.32 


y = 6.5 
y = 0.75 
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Using the values and the definitions of pos. and neg. given above, the H matrix for this 
NDT becomes. 

" 0.4 0.6 0.6 1 1 1 1 

111110 0 
h ~ 1 1 1 0.7 0.7 0.7 0.7 

_ 1 0.91 0.09 0.91 0.09 0.91 0.09 . 

C' = [ 0.4 0.6 0.09 0.7 0.09 0.0 0.0 ]. 

The corresponding fuzzy action can be found out in two ways: 

1. The output corresponding to the rule with the highest membership grade in C' is exe¬ 
cuted. In which case the action is corresponding to C' 4 . 

Valve A: normal 0.5 

Valve B: decreased 0.0 

Valve C: decreased 0.0 

Alarm 1 : activated 


2. The actions A \,.., A 5 can be considered independent of each other and the weighted 
sum of c- to be the action. Alarms, however, have to be activated with a threshold acting 
on their membership grade. The threshold chosen here is 0.5. In this case the controller 
action is 


Valve A 


1 x .4+0.5 x 0.6+0.5 x .09+0.5x .7+0+0+0 t _ a co 
0+0.6+0.09+0.7+0.09 1 ~ U ' J0 


Valve B: 
Valve C: 
Alarm 1: 
Alarm 2: 


0.24 

0.10 

activated 
not activated 


Consider the same process controlled by a boolean DT. In this DT the sets positive and 
negative shall be defined as 

ntpos(x) = 1 if x > 0 
= 0 if x <0, 
m neg( x ) = 1 - m pos {x). 

R2 and R4 would have both been satisfied. Fortunately the actions entailed by both are 
same. In fact NDT with method 1 also suggests the same action. 

Method 2, however, is better than the boolean DT by keeping both B and C open, albeit 
at low levels. Thus NDT is able to maintain the conflicting demands of level maintenance 
and flow maintenance in a better manner. 

There are many pathological cases which hamper the efficient working of an NDT. One 
such case is where the fuzzy table does not categorize a particular input condition to any 
of the actions strongly. 

The other possibility is that a point gets strongly classified into two altogether different 
actions. What is to happen in this contradictory situation is the point of discussion of the 
next section. 
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3. Concept of incompleteness, redundancy and contradictory specification in 
NDTs 

These concepts have been borrowed from deterministic DTs where they have been shown 
to be worthwhile in properly designing algorithms to implement a set of rules (Rajaraman 
1987). Let us understand what each of these terms imply: 

Incompleteness: This refers to a situation wherein there is no entry for a logically allowable 
combination of conditions. 

Redundancy: Let R\ : if A\ then B\, 

R 2 : if A 2 then B\, 

but from the conditions in the problem there is a constraint if A 2 then A 1 . In such situations 
rule R 2 is redundant. 

Contradictory specifications: This situation occurs when the DT demands ‘disjoint’ actions 
(i.e. actions distinct and incompatible) for a particular input data. 

Before trying to implement these concepts on a NDT let us understand the restrictions 
that bind us. 

1. The rule set should be clearly defined, i.e. there should not be any ‘else’ rule. 

2. The actions should be crisp and discrete 

3. There should be a threshold defined on the membership grade that a particular action 
has in the output fuzzy set. Without this restriction the DTs are by their nature complete. 

The idea of a ‘threshold’ can be formalized via the a cut operation of fuzzy sets (Kaufmann 
1975). The a-cut of fuzzy set A (designated by A a ) is defined as 

A a = {* e U Im^x) > a}. 

A„ is therefore a crisp set c U. By putting a threshold of a on A we get A a . The a-cut 
operation preserves the set operations ie. 

(A n B) a = A a n 
(A). =: A a , 

(A U B) a = A a U B a , 

with D, U and ~ suitably defined for crisp and fuzzy sets (Kaufmann 1975). Thus o:-cut 
operation can be looked upon as a homomorphism from the set of fuzzy subsets to ordinary 
sets. This property has an important bearing on our problem. 

Let R: If xeA and yeB and zeC then let w — A n be a fuzzy decision rule. A, B, C are 
fuzzy sets; A„ is a crisp action. 

Now if we impose a threshold on the output fuzzy set of the NDT to which R belongs 
then 


"* C '(A„) = 1 4=*- min[m A (x), mg(y), m c (z)] > a 
where a is the threshold 

[("MW > a) a ( mg(x ) > a) 
A(m c (jt) > a) = 1] 

<*=*► [(xeA a ) A (yeB a ) a (zeC a ) = 1], 
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i.e., putting threshold on the output set can be equivalently viewed as putting thresholds 
on the fuzzy sets underlying the condition. Once thresholds are put on fuzzy sets they get 
mapped to their homomorphic crisp sets. The long established techniques of validation of 
such DTs (Rajaraman 1987) can now be used. 

Since the homomorphic images of fuzzy sets are going to depend on the threshold chosen, 
it is quite natural to believe that the conclusions will depend on the level of threshold: a 
chosen. The steps for analyzing an NDT are: 

1. Set up the threshold and a-cuts of the fuzzy sets underlying the conditions. 

2. Draw the Karnaugh map of the boolean DT so arrived at as explained by Rajaraman 

(1987) 

© Cross out all squares in the Karnaugh map that are logically.impossible 
• Fill out the actions dictated by the table 
® All entries of an EEDT are considered separate conditions 
© Look for real or apparent contradictions 

We shall consider example 2 again to illustrate these steps. The threshold is set at 0.5 
in which case 

Pos a = {x real\x > 0} and 
Neg a = {x real \x < 0} 

Let us designate action sets corresponding to R \, R 2 ,..., Ri as Si, 52, -.., 57 . There 
is one logically impossible condition i.e. C\ = Y and Cn = N as the minimum specified 
flow in the interval is positive. The Karnaugh map of the DT is given in figure 3. 

The crossed out area is logically impossible under the constraints of the problem. The 
two empty boxes imply there are input combinations for which no rule is satisfied with a 
membership grade a, i.e. the table is incomplete when the threshold is set at 0.5. 

Boxes containing two or more actions are potential sites of contradition. Contraction may 
be apparent as in the case of 52 and 54 as both actions are the same. Other contradictions 
are all real contradictions. 



Figure 3. Karnaugh map for the NDT 
of figure 2. 
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Figure 4. NDT for example 3. 


No redundancy can be seen. If there is any redundancy then the area occupied by actions 
corresponding to any one rule shall be completely covered by another. Since this is not so 
there is no redundancy. 


Example 3. Consider another simple example. The NDT in figure 4 controls opening of 
valve V depending on temperature pressure and states of two other valves Vi and V 2 . 
Definition of the linguistic qualifiers as fuzzy sets is: 

Temperature (T): 0 < T < 1000°C 


Th : m T h(T) = 


0 T < 200 

0.5 + £ arctan Te [200, 800] 

1 T > 800 


Tl-.mriiT) =1 -m Th (J) 


Pressure(P): O < P < 500 kg cm 2 


Ph : mph(P) = 


0 

0.5 + arctan 
1 


PI : mpi(P) = 1 - mph(P) 


P < 150 
Pe [150, 350] 
P > 350 


Consider rules R] and R 2 : The actions A\ and A 2 are the same written out in terms of 
fuzzy logic operators. These rules are: 

R\ : (TeTh) and (VI open) = 4 - open V full 

i ?2 : (TeTh) and (VI open) and (P ePh) =>• open V full. 

Consider membership grades of the outputs corresponding to these rules in C' 

mo\ (y) = raia(m Th (T),m ope n(Vl)) 

« 02 (y) = min(mr/j(T),m ope „(Vl), m Ph (P)) 

<min(mTh(T),m open (Vl)) =m 0 i(y) 

This implies that rule R\ will completely cover R 2 whatever be the threshold. Thus R 2 is 
redundant. In the shortened table we shall drop out R 2 . Take a threshold of 0.5 and draw 
the Karnaugh map of the shortened DT as shown in figure 5. 
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(* the table has been shortened 
but numbering of actions 
remains same) 


Figure 5. Karnaugh map for example 3. 

The table as one can see is 

• incomplete: as there are empty (uncrossed out) boxes. 

• contradictory: as (Ai and A 5 ) and (Ai and A 4 ) are incompatible. (A 3 and A 5 ) is not a 
real contradiction as they are both same. 

• not redundant: as no reduction in rules is possible because there is no overlapping. 

An interesting feature of this example is that redundancy was detected and removed 
without reference to a threshold. This procedure, although the most satisfying, is very 
difficult to execute. 

So far in our discussion of fuzzy process control and implementation of NDT to control 
processes ‘threshold’ was never mentioned whereas this section is formulated with the 
‘threshold’ as the crux of the whole matter. The connection between these apparently 
dichotomous ideas lies in the nature of human decsion making. Human decision rules, 
although not incomplete, are contradictory and often redundant. To be able to remove 
them one would have to impose some determinism via the threshold. By no means is this 
undesirable. It safeguards against some serious lacunae of the unconstrained NDTs. 

1. Suppose for a particular input the highest membership grade is 0.2. If there is no thresh¬ 
old a certain action would be taken. If threshold is put at 0.4 the controller Karnaugh 
map will indicate that it is incomplete and will caution the designer about it. It shall, 
therefore, prevent an action from being executed if its ‘confidence value’ is very low. 

2. Two actions have a very close membership grades in the output set C'. say 0.6 and 
0.62. The NDT will choose the action corresponding to 0.62 and execute it. A suitably 
chosen threshold will bring out this fact and warn the designer about the phenomenon 
mentioned above. This implies that two rules are closely allied and a restructuring of 
these rules might lead to a reduction in the number of rules. 

Thus we see that choosing the right threshold for analysis is of utmost importance in 
proper design of NDTs. It is very subjective and changes from one application to another. A 
very low threshold means that the controller may be taking actions at very low confidence 
values. A very high threshold on the other hand forces the controller to take an action only 
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if a rule is satisfied to a very high degree thereby denying the nonspecificity of conditions 
and tending towards a boolean DT. A judicious threshold has therefore, to be, chosen. 

One thing must, however, be kept in mind. We have moved to a fuzzy controller because 
a boolean DT cannot work. The very fact implies that we can tolerate some amount of 
contradiction. If we want to exploit the judgemental nature of NDTs, thresholds must be 
used for design and never during implementation. It will be grossly erroneous to assume that 
an NDT is blind to all inputs that belong to a box showing a contradiction when threshold 
is set at a. It is just that it is blind with a as the benchmark. Since action corresponding to 
the maximum membership grade is performed most of the contradiction will be resolved. 

4. Conclusions 

We have seen in this paper that nondeterministic decision tables (NDTs) capture the smooth 
control exercised by a human operator while transitioning between actions. It can cope 
well in situations where the aims may be contradictory and the control action requires a 
weighted judgement. In its attempt to capture the judgemental nature of human decision 
making it becomes vulnerable to drawbacks that go with fuzzy modelling of conditions. 
The concepts of incompleteness, redundancy and contradiction, that are so well defined 
for boolean DT and help in their design and validation, are not so well specified in case of 
NDTs. To be able to define them for NDTs one has to introduce the concept of threshold. 
The threshold, if applied to an NDT, kills the very spirit of fuzziness as it converts it 
into a boolean DT. However, a well chosen threshold (for choosing a threshold is highly 
subjective) and the analysis of the boolean DT can give the designer of the NDT an idea 
about its quality but cannot directly help him improve it. 

The formulation of NDT presented here is much more general than the FDT suggested 
by Francioni & Kandel (1988). The NDT given here reduces to FDT if thresholds are 
applied on the fuzzy sets to segment the input space into distinct blocks and the rules 
recalculated using the procedure given here. As stated earlier such a table is incapable of 
taking ‘judgemental’ decision crucial for process control. 

Summing up, NDTs are good operational tools for process control but for them to be 
put on the same pedestal as DTs one would have to look for more satisfying concepts 
for defining their completeness etc., which not only evaluate but also suggest method for 
improving them. 
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Abstract. The objective of this study is to explore the possibility of capturing 
the reasoning process used in bidding a hand in a bridge game by an artificial 
neural network. We show that a multilayer feedforward neural network can be 
trained to learn to make an opening bid with a new hand. The game of bridge, 
like many other games used in artificial intelligence, can easily be represented 
in a machine. But, unlike most games used in artificial intelligence, bridge uses 
subtle reasoning over and above the agreed conventional system, to make a bid 
from the pattern of a given hand. Although it is difficult for a player to spell 
out the precise reasoning process he uses, we find that a neural network can 
indeed capture it. We demonstrate the results for the case of one-level opening 
bids, and discuss the need for a hierarchical architecture to deal with bids at all 
levels. 

Keywords. Artificial neural networks; backpropagation; games; contract 
bridge bidding; knowledge; artificial intelligence. 


. Introduction 

Artificial Neural Networks (ANN) can be trained to capture the implicit associations be- 
ween an input pattern and the corresponding output response of complex systems (Haykin 
994). Often such associations can be used to perform tasks like pattern classification and 
lattem mapping. But associations can be very complex, depending upon the richness of the 
lomain. We consider one such domain, namely, bidding in the game of Contract Bridge. 

In contract bridge, a player makes a bid to convey information about the pattern of 
he thirteen cards in his hand. If he is the first to make a bid, it is called “opening bid”, 
vhich he makes based only on the pattern of the cards he is holding. That is, he has no 
: priori knowledge of the rest of the cards in other players’ hands. But during the rest 
>f the auction he makes a bid based on not only his own cards, but also on the bids that 
lave been made till then. The problem of a machine making even an opening bid in this 
;ame is challenging. There are many hand patterns and situations which are not covered 
irecisely by the “rules” in the convention system used in the game. Thus a straightforward 
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application of a rule-based system cannot logically explain the reasoning process used 
by a player in making a bid. The objective of this paper is to explore the possibility of 
capturing the reasoning process of a player in making an opening bid based on the pattern 
of the cards presented to him. It may be noted that although there are rules describing 
the conventions (such as Standard American) of the bidding system used by a player, the 
player uses these conventions only as a guideline for making a bid with a given hand. It is 
generally difficult to precisely state consistently the reasoning process used in making a 
particular bid. Sometimes for a given hand, the same player may make a different bid at a 
different time. It is this variability in the reasoning of the player that we intend to capture, 
if possible, by an artificial neural network. 

Games have been of interest to Artificial Intelligence (AI) researchers because they 
provide well defined domains which are nevertheless rich in computational complexity. 
Games like bridge and chess have been engaging the attention of AI researchers for a 
long time. The intrinsic symbolic nature of these games implies that one does not have 
to compromise in representation. In contrast, many compromises have to be made in 
representation while dealing with real world problems, such as in speech recognition and 
image understanding. 

Traditionally, in the context of AI, game theoretic methods are applicable to two- 
person complete information games like chess. Bridge, however is different (Khemani 
& Ramakrishna 1984). Making a bid, in particular, is an incomplete information activity. 
A knowledge-based approach seems to be better suited. But we observe that subtle inter¬ 
actions between many simple rules make the traditional rule-based approach difficult to 
implement. It is here that we are hoping that ANNs might prove to be more useful. 

In § 2 we describe tire nature of the bidding problem and also the scope of the present 
work. In § 3 we describe the issues involved in exploring ANNs for the bidding prob¬ 
lem. The basic network architecture and experiments with different number of nodes are 
described in §§ 4 and 5, respectively. A hierarchical architecture is suggested in § 5 to 
overcome some of the difficulties of the basic network. 


2. Bidding problem: Scope of the present study 

The goal of bidding in bridge is to cooperatively estimate the playing strength of the two 
hands held by a partnership, and arrive at an optimal contract. Some details of the game of 
bridge are given in appendix A. Since each player can see only his/her hand, cooperation 
with the partner takes place in the fonn of exchange of information encoded in the bids. 
The formal interpretation of staking a claim through a bid is described below. 

A bid consists of two parts. 

(a) The denomination or the level which determines the number of tricks the bidding side 
is claiming to make. 

(b) The suit which the side selects as “trump”. For example, a bid of “3 spades”(3S) means 
that with spades as trump the side will make nine (3 + 6) tricks. Apart from the four 
suits (spade(S), heart(H), diamond(D) and club(C)), one can also choose to bid “no 
trumps”(N), meaning that no suit is the trump, and all are equal. 



Neural networks for bridge bidding 


397 


The different bids that can be made have an order imposed by the rules of the game, as 
follows: 1C, ID, 1H, IS, IN, 2C, 2D, 2H, 2S, 2N, 3C, 3D... upto 7N. Bidding can only 
proceed left to right in the order. That is, a player can only choose, other than a “Pass”(P) 
or no bid option, a bid on the right of the last bid he heard. In addition a “double” can be 
used (bid) over an opponent’s bid, and a “redouble” over an opponent’s doable. Thus the 
total number of bids to exchange information is limited. 

A further constraint is imposed by the fact that bidding higher levels implies a greater 
commitment in terms of the number of tricks, which may not be backed up in play by 
the strength of the cards. Thus making a bid at a higher level has to be a judicious choice 
between the need to convey more information and the risk of overshooting the estimate in 
the number of tricks that can be made. Many bidding systems have evolved in an effort to 
make an optimum use of the limited bidding space for conveying maximum information. 

Bidding thus is a distributed cooperative process whereby a team of two players endeav¬ 
ors to find an optimum contract. During the course of a bidding sequence a player may 
aim to convey the suits held by him, support for the suits held by the partner, the strength 
of his hand in terms of high cards (such as Ace, King, Queen and Jack, which are usually 
encoded in terms of numerical values), and other kinds of strength (such as single card 
and void in a suit). The estimated strength of the hand changes dynamically, depending on 
the information received from the partner. Information exchange may be intercepted by 
the bids opponents may make. Or additional information may be available from the bids 
opponents make. 

2.1 Information in the hands 

The fifty two cards of a pack can be dealt in about 5.36 x IQ 28 ways. Thirteen cards can 
be picked from fifty two in 6.35 x 10 n ways. There are 39 possible hand patterns as 
shown in table 1. These range from the most balanced 4-3-3-3 shape (4 cards of one suit, 
and 3 each of the rest) to the most unbalanced 13-0-0-0. These patterns and the percent 
probability with which they appear are shown in table 1. The percentage probabilities listed 
under TOTAL do not distinguish between the suits. The column named SPECIFIC lists the 
percentage probabilities when the suit is also specified along with the length. For example, 
the percentage probability of getting a 8-2-2-1 hand (pattern number 24) is 0.1924 percent. 
However, the percentage probability of being dealt with 8 clubs, 1 heart, and 2 each of 
diamonds and spades is one twelfth of 0.1924, i.e., 0.016 percent. 

The aim of a bidding system is to convey the maximum amount of information for 
cases which are most frequent. Obviously all possible hands cannot be described because 
of “low bandwidth” of the bidding channel. Usually, for instance, players may open the 
bidding from a choice of 21 bids beginning from the pass level upto the level 4. It would 
be wasteful to reserve one of these bids to describe a hand, say of 7-6-0-0 shape, which 
may occur once only in twenty thousand times. 

Since four-card and five-card suits are most likely, most bidding systems are generally 
designed taking this a priori knowledge into account. Thus, a simple bidding rule might be 
“IF you have at least 4 cards in Spades THEN bid Spades”. A bidding system is usually a 
collection of such rules, and also it contains rules which can take into account the strength 
of the high cards. For example, a rule may say “IF you have 13 points THEN you can make 
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Table 1 . Distribution of hand patterns- 


No. 

Pattern 

Total 

Specific 

No. 

Pattern 

Total 

Specific 

1 

4-4-3-2 

21.5512 

1.796 

24 

8-2-2-1 

0.1924 

0.016 

2 

4-3-3-3 

10.5361 

2.634 

25 

8-3-1-1 

0.1176 

0.010 

3 

4-4-4-1 

2.9932 

0.748 

26 

8-3-2-0 

0.1085 

0.005 


— 

— 

— 

27 

8-4-1-0 

0.0452 

0.002 

4 

5-3-3-2 

15.5165 

1.293 

28 

8-5-0-0 

0.0031 

0.0003 

5 

5-4-3-1 

12.9307 

0.539 


— 

— 

— 

6 

5-4-2-2 

10.5797 

0.882 

29 

9-2-1-1 

0.0178 

0.001 

7 

5-5-2-1 

3.1739 

0.264 

30 

9-3-1-0 

0.0100 

0.0004 

8 

5-4-4-0 

1.2433 

0.104 

31 

9-2-2-0 

0.0082 

0.0007 

9 

5-5-3-0 

0.8952 

0.075 

32 

9-4-0-0 

0.0010 

0.00008 

10 

6-3-2-2 

5.6429 

0.470 

33 

10-2-1-0 

0.0011 

0.00004 

11 

6-4-2-1 

4.7021 

0.196 

34 

10-1-1-1 

0.0004 

0.0001 

12 

6-3-3-1 

3.4482 

0.287 

35 

10-3-0-0 

0.00015 

0.00001 

13 

6-4-3-0 

1.3262 

0.055 


— 

— 

— 

14 

6-5-1-1 

0.7053 

0.059 

36 

11-1-1-0 

0.00002 

0.000002 

15 

6-5-2-0 

0.6511 

0.027 

37 

11-2-0-0 

0.00001 

0.000001 

16 

6-6-1-0 

0.0723 

0.006 


— 

— 

— 

17 

7-3-2-1 

1.8808 

0.078 

38 

12-1-0-0 

0.000003 

0.0000003 

18 

1-2-2-2 

0.5129 

0.128 


— 

— 

— 

19 

7-4-1-1 

0.3918 

0.033 

39 

13-0-0-0 

0.09 x 10" 8 

0.02 x 10~ 7 

20 

7-4-2-0 

0.3617 

0.015 


— 

— 

— 

21 

7-3-3-0 

0.2652 

0.022 


— 

— 

— 

22 

7-5-1-0 

0.1065 

0.005 


— 

— 

— 

23 

7-6-0-0 

0.0056 

0.0005 


— 

— 

— 


Values listed under specific are for named suits having specified length. Numbers under total 
sum up the values for all possible ways of choosing suits for the given pattern or shape. 


an opening bid”. Here 13 is a numerical encoding of the card strength, which counts an 
Ace as 4, a King as 3, a Queen as 2 and a Jack as 1. A combination of the above two sets 
of rules may thus allow a player to bid 1 spade (IS) with the following hand, 

Q 8 6 3 2, K J 5, A 9 7, Q J, 

where the suits from left to right are: Spades, Hearts, Diamonds, and Clubs. 

2.2 Why not rule-based system? 

One could thus think of building a rule based system which helps in making a bid. The 
difficulty with such an approach is that players normally use these rules only as a guideline, 
and often they make bids the reasoning for which they cannot articulate in terms of the 
given rules. For example, for a hand containing 4 spades and 4 diamonds, a rule may 
suggest opening IS, or possibly ID. But for the two hands given below, which are only 
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slightly different, a player may choose different bids as shown: 

AJ9 3,K8 4, K74 3, A9 — bid : IS 
AJ9 3,K8, K74 3,A94 — bid : 1C 

This change comes because of the subtle reasoning process the player uses, since he 
is also concerned about his next bid. There are other patterns too in the hand for such a 
reasoning. For example “K84” is a “support” for a possible bid of 2H by partner, while 
“K8” is not. “A94” in clubs, on the other hand is an openable suit if the hand has no five 
carder. One could possibly list all such possibilities as rules, but the number of rules may 
be too many. One of the aims of our experiment is to test whether an ANN can capture 
this reasoning based on input patterns which expert players seem to be using. A neural 
network could capture the implicit knowledge because ANN learns from examples. Thus 
in this case an expert need not articulate all the rules, but only adopt them appropriately 
while making a bid. 

Another interesting feature of bidding is that sometimes the bid for a given hand need 
not be unique. Two or more alternate bids for the same hand may seem reasonable to the 
human players. The choice is also likely to be influenced by contextual factors in the game 
like the past history of the match and the disposition of the player. Thus in general, a bid 
is judged subjectively with respect to the available knowledge. But often this knowledge 
is a dynamic process as an a posteriori analysis causes new associations to be learnt. 


2.3 Scope - opening bids 

In this paper we consider only the opening bids, when the player is first to bid, and no other 
information is available or relevant. This is the simplest case where the player has to make 
a bid based only on the pattern in the 13 cards of his hand. The bidding is not influenced by 
other factors such as bids by other players, the position of the game, etc. However the bid is 
influenced by the bidding convention that players adopt. The convention system (bidding 
system) helps a player to look for pattern features to evaluate the strength of a given hand. 

The reason for considering the opening bid is that it is the only bid that is made solely 
on the basis of the pattern in the given hand. Any bid made after the opening bid utilizes 
the information conveyed by the preceding bids also, besides the given hand pattern. For 
example, if your partner has opened with a bid of IS, then you may bid 2S with the 
following hand, 

Q 9 8 5, A 3, J 8 5 4, 965. 

Clearly, the bid of 2S is based on the information that the partner has a good hand with 
a spade suit. This information has to be combined with the patterns of the given hand to 
produce the bid of 2S. Also, in some sense the information contained in the given hand 
is in a raw form, while the information received from the bid of IS is in an abstracted 
form. Combining such information is a more complex process and beyond the scope of 
this paper. 




400 


B Yegnanarayana et al 


2.4 Bidding convention 

The bidding convention assumed in the current exercise is the Standard American bidding 
system. This was chosen because it is less artificial than other systems, for example Preci¬ 
sion bidding system. All systems use some bids for which the semantics is not straightfor¬ 
ward. Artificial systems use more of such bids, where the meanings are very well defined, 
but may not have a direct relation with the obvious pattern in the hand. For example, a 2D 
bid in the Precision system denies a diamond suit, rather than promising the suit. Natural 
systems, on the other hand, have more direct mappings. In the process they also allow for 
subtle changes in the bidding rules, often called “expert judgment”, a phenomenon we 
hope to capture. 

There are about 10 15 different hands a player can start with. A look at table 1 shows 
that hand patterns containing seven or more cards in a suit constitute less than 5% of this 
number. A random hand generator will find it difficult to produce representative samples 
of all possible patterns. Seven card or longer suits may be normally opened only at third 
level or above. Long training cycles would become necessary if the training set were 
large enough to include all these less frequently occurring patterns. In this paper we have 
limited our study only to low (upto 2) level bids. This decision was arrived at by a process 
of experimentation described later in the paper. 


3. Why artificial neural networks? 

Artificial neural networks are attractive because the networks learn from examples. If in 
the process they can generalize (Hertz et al 1991), then they can also provide a useful 
interpolation capability. Due to generalization capability, it is possible that the system 
may capture the subtle reasoning process used in making a bid, which may be difficult to 
incorporate explicitly in the rule-based formulation. 


Table 2. Variations in players’ bids. 


Set no. 

No. of people 
who bid 

No. of hands for which 
their bids differed 

% 

1 

2 

14 

17.50 

2 

3 

17 

21.25 

3 

2 

26 

32.50 

4 

2 

24 

30.00 

5 

2 

14 

17.50 

6 

2 

13 

16.25 

7 

2 

17 

21.25 

8 

2 

29 

36.25 


Sets of 80 hands were bid by human players. It can be observed 
that for every set the players differed on a significant number of 
hands, suggesting that more than one correct bid may exist. 
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The bidding problem, as we see it, is an exercise in high level perception. It involves 
mapping complex patterns in a hand onto a single output corresponding to the bid for the 
hand. ANN’s do precisely this. Given a set of input patterns and the corresponding target 
patterns, they try to capture die implicit relation between the two. Once the network has 
been trained to generalize, then it can respond meaningfully to a new pattern. 

Realizing a network even for the opening bid for a given hand is not a trivial task. In fact 
the relation between the input pattern and the corresponding output bid is not unique even 
amongst expert players as shown in table 2 from the data collected at a bridge tournament. 

We have decided to explore the possibility of training a multilayer feedforward neural 
network with a backpropagation algorithm (Rumelhart et al 1986) to capture the implicit 
reasoning from several examples of input(pattem)-output(bid) pairs of data. In order to 
perform this study several other issues need to be considered. Some of them are described 
below. 

3.1 Representation 

The input can be represented as raw data as shown in figure la, where the input selects 13 
of the 52 input nodes, representing the 13 cards that have been dealt. Thus the input layer 
here consists of 52 nodes, each of which may have a value either 1 or 0. 

The input can also be represented in the form of feature patterns as shown in figure lb. 
These patterns are based on the evaluation of the strength of the hand by a bidding system. 
In this representation there are 16 nodes in the input layer, which can take values between 
— 1 and +4. Thirteen nodes are used to represent the cards, while three are used as markers 
(—1) between suits. In this representation an attempt was made to feed some feature 
information in the form of relative weights given to various cards. These weights were close 
to the points given to high cards in most bidding systems. Bidding systems use numerical 
values for Aces, Kings, Queens and Jacks, as described earlier. In our representation we 
gave appropriate values to the other cards as well. Our initial experimentation showed that 
the first representation was preferable, as the feature representation is somewhat subjective. 
During training we found that the network converged with the first representation, whereas 
the feature based representation failed to converge in some cases. This is interesting because 
the information in the second case is in an interpreted, or an abstracted form. It appears 
that abstraction from raw data, if not done properly, may not be useful for obtaining 
generalization by a network. Experiments described in this paper therefore use the first 
representation. 

It is also interesting to note that from the representation of the raw data as shown in 
figure la the network is able to extract the necessary pattern information to capture the 
reasoning process without explicitly describing the pattern. This is to be contrasted with 
pattern recognition problems in speech and vision, where features from the input data have 
to be extracted carefully and fed as input to the neural network. In fact the performance 
critically depends on the feature extraction stage in such cases. 

The second important issue is the representation of the bids. A straightforward repre¬ 
sentation of all possible bids each by a separate output node leads to problems in training. 
Since the number of sample hands for higher level bids will be fewer relative to hands 
for the lower level bids, the network may not converge for a given set of limited data. 
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HAND 2 - S:Q9 
H : 3 

D: KQJ8542 
C: QT4 

4- 



Figure 1. Illustration of input layer patterns for two hands. In the first representation 
(a) the input is in raw form, while in the second (b) some features (high card points) 
have been extracted. The networks perform better with the raw information. 


To study the convergence behaviour of the network, we have conducted experiments with 
networks limited to different number of output nodes, corresponding to different levels of 
bidding. These experiments suggest that a hierarchical network approach may be needed 
for implementing the opening bid problem. 

A third issue is the interpretation of the output results. It is quite possible that dur¬ 
ing testing the network produces a bid which is different from a player’s bid. But then 
the player should also find out whether the network bid is also reasonable for the given 
hand. 

We have provided one node in the output layer for each possible bid. Starting with a 
network with output nodes for all possible bids, upto seven levels we have reduced the size 
of the network upto two level bids at which it can be trained within a reasonable time. 

There are two ways in which the output can be interpreted. One way is to accept the bid 
represented by the node giving highest activation value for the output. But one has to be 
careful here, as two likely bids may have nearly equal output activation values. The second 
way is to have a threshold, so that a node with an activation value above the threshold is 
considered to be a plausible bid. This has the drawback that no such node may exist for a 
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particular example, and the output will have to be reexamined. We have adopted the first 
strategy of picking the highest node with the activation value as the output bid. 


4. Backpropagation network for opening bid problem 

A multilayer feedforward network was used with the backpropagation (Rumelhart et al 
1986) algorithm for training. There are 52 nodes in the input layer. The number of nodes 
in the hidden and output layers were varied as described in the following experiments. 


4.1 Data generation 

The hands used for training of the network were generated by a program which simulates 
shuffling of cards. The distribution of the hand patterns generated by the program matches 
the distribution given in table 1. A representative set of 19 hands are given in table 3. Note 
that the hands which contain suits of maximum length 5 constitute about 80% of all the 
hands. On the other hand, bids of 2H and 2S, in the configuration named 2-NT network, 
require the following features: a six-card suit, with no singleton or void in the hand, about 
8 to 10 high card points, with most of the high cards in the bid suit. To successfully train 
the network for these bids, it is necessary to have a large number of these samples in the 
training set. This in turn would mean a correspondingly large training set and hence a large 
training period. We have used a generating program to produce hands according to a given 
set of constraints, for example, length of heart suit to be at least 6, number of points to be 
at least 6. In this way, we can produce more hands for which we want the system to learn 
patterns. But this method of generating large number of sample hands does not reflect the 
pattern environment of the bidding problem. A pattern environment is described by the 
patterns together with the probabilities of occurrence of these patterns. 


4.2 Representation of input patterns 

The input layer has 52 nodes, one for each card. Each node has a value of 1, if the card is 
present, or 0, if the card is not present in the given hand. Thus the input vector marks out 
a subset 13 out of the 52 cards. Each training data pair consists of 13 activated nodes in 
the input layer and one desired activation node in the output layer. For example, the first 
hand in table 3 and the corresponding input pattern vector are given by 

Hand: K753-KJ8-K87-K76 

Input pattern: 0100000101010010100100000001000011000000100000110000. 


The desired output is activation of the node corresponding to the bid of 1C in the 2-NT 
network. The 2-NT network has output nodes corresponding to all the five one level bids, all 
the five two level bids, a “pass”(P) bid and an “unknown”(U) category bid. The “unknown” 
category includes all hands corresponding to the levels above the 2-NT level. 
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Table 3. Sample hands generated by a program. 


No. 

Hand 

(S)-(H)-(D)-(C) 

Points 

Desired bid 
(by authors) 

1 

K753-KJ8-K87-K76 

13 

1C 

2 

Q9-3-KQJ8542-QT4 

10 

3D 

3 

863-KQJT954-K9-5 

9 

3H 

4 

T853-KT83-A9-AQ7 

13 

IS 

5 

J62-AJ5-K652-942 

9 

P 

6 

7-AQT976-874-K85 

9 

2H 

7 

Q87-KQJT63-92-95 

8 

2H 

8 

A98-K2-AQJ62-AT6 

18 

IN 

9 

QT75-A73-J3-AKJ8 

15 

1C 

10 

T75-K52-AK86-QJ7 

13 

ID 

11 

Q94-QJ832-A95-AK 

16 

1H 

12 

AKT7642-Q53-8-T5 

9 

3S 

13 

J986-72-AQ543-J 8 

8 

P 

14 

A7-K94-KT65-AK74 

17 

IN 

15 

A92-4-AK82-AJ832 

16 

1C 

16 

QT-76-AQJ752-AJ3 

14 

ID 

17 

A-KQJ9873-J98-T6 

11 

1H 

18 

AK842-AKT93-6-82 

14 

IS 

19 

AQJT-QT4-AJ-QT92 

16 

IN 


Bids in the table were made by the authors, to illustrate the 
preparation of the training set. In this training set some less 
frequent hands are present which are generated specially 
to ease learning. 


5. Network training 

This section describes the development of the network architecture. The training algorithm 
used was the backpropagation algorithm. The network architecture was evolved by a trial 
and error process. Initially we have started with a network aimed at capturing all the bids. 
These larger networks failed to converge. Therefore we have pruned the network size by 
reducing the number of output nodes first. Thus we have designed smaller networks with 
fewer output nodes. The process of pruning suggests a method to evolve a set of modular 
networks organised in a hierarchical fashion, where each module specialises on a subset of 
bids. In the following, we describe our trial experiments for evolving a suitable architecture 
for the bidding problem. 

5.1 Experiment 1 

An approach using 13 output nodes to capture all the bids was explored. Seven nodes were 
assigned to the 7 levels of bids, five were assigned for the suit and one node for “pass” 
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Table 4. Performance of the 1-level network. 


Hidden nodes 

tss 

No. of epochs 

No. of hands 

40 

22.20 

5000 

1600 

45 

23.80 

5000 

1600 

50 

23.01 

5000 

1600 

55 

21.40 

5000 

1600 

60 

19.12 

5000 

1600 


Here 52 input nodes and 7 output nodes were used. 


bid. Thus, for each input, except for a “pass” hand, two output nodes are expected to be 
activated, one for the level of the bid, and the other for the suit. The training set consists 
of 900 hands. Different network architectures were examined. Some of them are: 

(a) 30 and 20 nodes in the 2 hidden layers, 

(b) 30 and 6 nodes in the 2 hidden layers. 

It was found that the network did not converge. This was probably because there were a 
large number of bids (some 2-level bids and all 3- or higher level bids) for which very few 
training patterns were available. Since opening bids are rarely made at levels higher than 
3, quite a few of the output nodes in this representation were not used. This is motivated 
by the fact that the two components of the bid, viz. level and suit, may not be independent 
of each other. For subsequent experiments, we have decided to use a simpler format for 
output nodes with one output node for each bid. 


5.2 Experiment 2 - The 1-level network 

To resolve the convergence problem, we reduced the number of output nodes to seven, by 
restricting the bids to 1 level only, including “pass” and the “unknown” category bids. Then 
we observed that the network converged. Actually, our motivation behind this experiment 
was to verify the feasibility of learning input patterns. 

The resulting network consists of 52 input nodes, 7 output nodes and one hidden layer. 
The number of nodes in the hidden layer was varied to study its effect on the performance. 
Table 4 gives the total sum of squared error(tss) for 40, 45, 50, 55 and 60 hidden nodes. 
The network with 60 hidden nodes gave the highest accuracy of 92% correct bids on the 
test set, when compared with the bids made by an expert player. The test data consisted of 
200 randomly generated hands. Some test results are given in appendix B and in table 5. 
It should be noted that while evaluating the performance, if the output of the network was 
also acceptable by the expert player as a possible bid, then it was taken as correct output. 
Errors in the 1-level network are likely to occur because there are many borderline hands 
which may fall into the categories of either 1-level or 2-level. Fewer such hands are likely 
at the borderline of higher level bids, such as between 2-level and 3-level. 
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Table 5. Bids made by the 1-level network. 


No. 

(S)-(H)-(D)-(C) 

Points 

Expert 

bids 

1-level 
network 

1 

KJ6-Q62-K94-A864 

13 

1C 

1C 

2 

-4-AT942-AKJ8974 

12 

U 

1C 

3 

J8-KQT643-KQ92-T 

11 

P 

1H 

4 

K64-AQ874-AT953 

13 

1C 

1C 

5 

AKQT9-J765432-9- 

10 

U 

P 

6 

AT63-KQJT6-AT7 3 

14 

ID 

ID 

7 

AQT87-K4-AKT4-Q7 

18 

IS 

IS 

8 

AKJ9542-QJ852-6 

11 

u 

u 

9 

62-AT3-J942-AKT7 

12 

1C 

1C 

10 

Q653-AKJ8-AJ-974 

15 

1H 

IS 


Strong imbalanced hands were labelled “unknown” (U) for train¬ 
ing purposes. However sometimes (e.g. example 2) the system 
did better by opening 1C. Also, in example 3, the network’s bid 
seems to be better! The discrepancy in the last example is also 
typical of human players. 


5.3 Experiment 3 - The 2-NTnetwork 

We consider a network to include 2-level bids. Here, the number of output nodes are 12, 
one each for P, 1C, ID ... up to IN, 2C, 2D ... up to 2N and U. 

Initially we attempted to train the network with a training set conforming to the the¬ 
oretical distribution of hand patterns. But the network could not be trained. The reason 
is that the network was unable to learn the patterns for 2C, 2D, 2H, 2S, 2N bids since 
they are very rare. Getting suitable samples of such hands requires large numbers of train¬ 
ing sets. Instead we have decided to selectively insert the patterns, which are rare, into 
the training set. Nearly 250 hands from these 2-level bids were added along with nearly 
1350 hands from the 1-level bids. As a result we have obtained 1600 hands to train the 
network. 

This network was trained using five different architectures having 40, 45, 50, 55, and 
60 hidden nodes. The performance of the network for this training set is shown in table 6. 

Results produced by the network are given in appendix B and in table 7. The net¬ 
work has bid correctly for about 92% of the test hands, which were not part of the train¬ 
ing set. On the training set hands, the performance of the networks varied from 95% to 
100 %. 

Another point is that initially we planned to give only positive samples of hands for the 
bids which we wanted the system to make. But we found that for the system to perform 
well, we also had to give a large number of hands for which we did not want the system 
to make a bid. So, we have introduced all those hands under the bid “unknown”. The 
performance of the resulting network is shown in the table 7. 
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Table 6. Performance of the 2-NT network. 


Hidden nodes 

tss 

No. of epochs 

No. of hands 

40 

11.01 

5000 

1600 

45 

12.78 

5000 

1600 

50 

10.52 

5000 

1600 

55 

11.35 

5000 

1600 

60 

8.14 

5000 

1600 


Here 52 input nodes and 12 output nodes were used. 


5.4 A proposed architecture 

It can be clearly observed that smaller networks are easier to train, and consequently they 
also perform better. On the other hand, looking at the task environment, one can see that 
all the bids made at higher levels are specialized. In addition, they deal with hands that 
are less frequent. This is consistent if one associates a cost with each bid proportional to 
the level at which it is made. Hands with four-card and five-card suits are most common 
(80%) and the bidding systems are designed to use the cheaper (low level) bids for these 
hands. To design a complete ANN system would require sufficient training samples for 
hands of all patterns. It appears reasonable to consider the high level specialized bids 
as exceptions, and train different networks to deal with them. Thus one would have a 
modular structure of the network, each module catering to a specialized situation. These 


Table 7. Bids made by the 2-NT network. 


No. 

(S)-(H)-(D)-(C) 

Points 

Expert 

bids 

2-NT 

network 

1 

KJT4-QT9762-A-T3 

10 

P 

1H 

2 

A8-QT2-KQJ97532- 

12 

ID 

ID 

3 

7-QJ98-QJ8752-T3 

6 

P 

P 

4 

Q-KQJ852-73-K983 

11 

P 

2H 

5 

K94-A5-AJT743-74 

12 

ID 

ID 

6 

AK8742-J4-K852-9 

11 

IS 

IS 

7 

AT9872-5-K54-643 

7 

2S 

2S 

8 

A-K82-QJ9642-Q97 

12 

P 

ID 

9 

J6-JT-8-AKQ98654 

11 

3N 

1C 

10 

84-KJT9542-Q8-A3 

10 

2H 

2H 


Many experts would open hand 4 with 2H, because the key 
feature - long solid suit is present. In hand 8 the system has 
in fact done better by opening ID. In hand 9 it possibly had 
to choose between “unknown” and 1C, since it does not know 
the 3N bid, which is very specialised. 
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INPUT HAND - 52 NODES 



P 1C ID IH IS U 


Figure 2.. A hierarchically structured neural network. Specialized modules, which 
can be trained individually, look for specific patterns before passing on control to the 
next module in the hierarchy. The input hand is fed into each module. 


can be organized hierarchically so that control flows from the exceptions to the general 
hands. 

A possible hierarchical architecture is shown in figure 2. Each network in the hierarchy 
looks for specific patterns. For example the network for the 3-level weak bids looks for a 
pattern of seven-cards in a suit with no other four-card suit and about 7 to 10 high card 
points with most of the points being in the bid suit. It either gives one of 3C, 3D, 3H, 3S 
bids or classifies the hand as unknown. If the “unknown” output node is activated, it in 
turn activates the following network. In this way control flows in a cascaded form from 
the specialized networks to the more general one. Preliminary investigations suggest that 
the specialized networks can be trained easily with much smaller sets of hands. 

But it is not clear whether a hierarchical architecture could be developed based only on 
these specialized networks. There are several issues in training a network made of many 
modules. One is the flow of control and the interaction between different modules. The 
different modules may not be independent of each other. There may be overlap among 
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he output classes. It is not clear how the training set for the different modules should be 
>rganized. 


►. Conclusions 

Phe studies reported in this paper clearly demonstrate that a neural network can be trained 
o capture the implicit reasoning used for bidding a hand in bridge. From the above exper- 
ments, the following points are worth noting: 

a) A large network is difficult to train. 

b) A 1-level network performs well. 

c) Even a 2-NT network does not learn well because of lack of data for 2-level bids. 

d) But if specific data are added for the 2-level bids, then the network may perform well. 

e) This leads to the idea of a modular hierarchical network. It may be possible to optimize 
the network design taking into consideration probabilities of input-output patterns. 

f) Finally, it requires a more sophisticated network architecture to construct a system to 
bid hands through a complete auction. This is because of the necessity of combining 
raw information (the hand) with the processed information (the bids). 

The present study clearly brings out several interesting research issues to be explored 
ising neural network architectures. The first issue is the representation of the input data, 
n situations like card games, representation in raw form appears preferable, as any feature 
epresentation is likely to be subjective and may result in loss of information. In contrast, 
n problems dealing with speech and image data, it is essential to represent the data in a 
nanner that reflects the auditory and visual sensory processing, respectively. Errors in the 
feature representations are usually responsible for poor generalization performance in the 
jattem recognition tasks involving speech and image. The second major issue is training 
1 network with patterns occurring with widely different probabilities. This is a difficult 
ssue in many practical problems, including for example speech, where different speech 
iounds occur with widely different probabilities. 
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Appendix A. A note on contract bridge: the problems 

Contract bridge is played with a regular pack of 52 cards dealt randomly and equally among 
4 players. Let us call them North, South, East and West, according to their position at the 
table. North and South are partners, as are East and West. The cards are ranked in the order 
Ace, King, Queen, Jack, 10, 9, ...2 in each suit. Each player plays a card, in clockwise 
order and the highest ranking card wins the trick. Thirteen such tricks are played, each 
time the winner of the previous trick starting play. This constitutes one deal or one hand. 

There are two stages of play in each deal, viz. bidding, followed by the play of cards. 
The goal in a deal is to maximize points. The points essentially depend upon bidding. Bids 
are made for the number of tricks the side promises to make, given the stated “trump” 
suit. Eventually the highest bid is accepted in each deal. This is known as the contract. 
Generally, the higher a side bids, the more points it is likely to win, provided it can fulfil 
the contract. That is, if the side can make the number of tricks it has bid for, it wins some 
points. Let us call them success-points. If it loses, then the opponents get some points 
instead, which we can call penalty-points. 

The straightforward goal in bidding is to bid the highest number of tricks one thinks the 
side can make. That is, to maximize success-points won. The means used in this process 
are the following, 

(a) Evaluation of own hand. 

(b) Communication with partner. 

(c) Projection of play. 

Of these, the first two are simpler and can possibly be handled by heuristic methods. 
The third is more difficult, as it would involve constructing plausible distributions (based 
on the bids heard, and on probability) and then projecting the play. Using neural nets we 
hope to implicitly capture all three components for the opening bid situation. 

A more complex goal is to make a sacrifice bid. This essentially means intentional 
overbidding, over an opponent bid, with the hope that the penalty-points loss will be lesser 
than the opponents’ expected success-points gain, thus being an overall gain. 

Even more complex goals are to sabotage the opponents’ communication. This may 
mean consuming the bidding space (jamming the communications channel), or even mak¬ 
ing “false” bids to confuse opponents. In the process, an enterprising planner may make 
an “advance sacrifice” to “push” the opponents higher than they can manage, or to escape 
with a lighter penalty. 

Considering that all these processes happen when the planner can see only one hand, 
one observes that bidding is probably the more difficult part of the game. 

Once bidding is over, the goal for the play stage has been defined. One side has the 
contract, and is required to make the bid number of tricks. At this stage one player of 
the contracting side (called the dummy) exposes the cards to everybody, while the other 
(called the declarer) plans and executes the play. The opposing side (called defenders) are 
said to defend the contract, trying in fact to defeat the accomplishment of the contract by 
the declarer. 

One can see that the situation at this stage is not symmetric. The declarer knows the entire 
strength of his side, and is in total control of the play of the cards. He is also aware of the 
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tire assets of the defence, in terms of material strength, since they have the remaining 
cards. Each defender knows only his own hand, and cannot see his partner’s hand, 
lerefore the two defenders have to combine their efforts to try and achieve the goal. This 
cessarily involves (formal) communication between the two. Both can see the dummy 
;o. 

Since the cards of all the players cannot be seen, one cannot project moves into the future, 
ethods like minimax search are therefore ruled out immediately. Instead, the success of 
strategy can only be estimated based on the probabilistic distribution of cards, and 
y information gleaned from the communication taking place. The strategies themselves 
3 derived from knowledge about the various known methods of tackling different card 
mbinations. 

The straightforward goal in the play of the hand is to make the number of tricks as stated 
the contract. The emphasis is on maximizing the probability of success. If success is 
sured, then the goal can be revised to increase the number of tricks won, as some more 
ints can then be gained. If success seems unlikely, then a planner may even choose 
minimize losses, i.e., the penalty-points won by the opponents. Like in bidding, the 
inner may attempt to do better than par, by exploiting the incomplete information that the 
iponents have. This may introduce complex “meta-level” goals of protecting information, 
sending out misleading signals. 

Thus, we see that unlike games like chess, where a clear-cut strategy of aiming for the 
inimax value (saddle) points is meaningful, in bridge one has to largely grapple with 
complete information. In the face of such uncertainty, planning in bridge can only be a 
mplex knowledge intensive activity. 

[ipendix B, Sample outputs from bidding networks 

le bid made by the 1-level and 2-NT network for some hands are shown in table B1. Also 
eluded are bids made by two players earlier. It can be seen that for most of the hands the 
tworks perform well. 
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Table Bl. Expert bids and the bids made by the 1-level and 2-NT net¬ 
works for the same hand. 


(S)-(H)-(D)-(C) 

Points 

Expert 

1-level 

2-NT 



bids 

network 

network 

AT 5-J983-K5-A JT 5 

13- 

1C, 1H 

1C 

1H 

AQ6-A752-AT2-KQ3 

19 

1C, 1H 

U 

1C 

A95-AQ9-543-AQ93 

16 

1C, IN 

IN 

1C 

K85-K94-KQJ94-96 

12 

P, ID 

1C** 

ID 

T-AK6-J8642-QJ63 

11 

P, ID 

p 

ID 

KJ3-T-QT9643-AQ9 

12 

P, ID 

ID 

ID 

AQ84-AJ984-J92-9 

12 

P, 1H 

IS 

P 

AQ8-AQJ6-63-AJ94 

18 

1H, IN 

1C 

1C 

974-A732-AK6-QJ6 

14 

1C, 1H 

1C 

1H 

8753-4-AT5-AKT98 

11 

P, 1C 

p 

P 

98-AK6-K854-AK43 

17 

1C, IN 

1C 

1C 

QJ72-K743-K9-A54 

13 

IS, 1H 

IS 

IS 

Q72-A98-Q64-KJ92 

12 

P, 1C 

p 

P 

JT4-J8-AKQ6-AQ64 

17 

1C, IN 

IN 

1C 

AKQ97-52-T3-9853 

9 

P,2S 

U 

2S 

K843-A64-A52-AT3 

12 

P, IS 

IS 

IS 

K832-AKJ3-QT5-K6 

16 

IN, 1H 

IS 

1H 

-Q98762-KQ753-Q3 

9 

P, 1H 

p 

P 

8-QJ5-A74-KJ9875 

11 

P, 1C 

p 

P 

K8-AT94-JT7-AT65 

12 

P, 1H 

1H 

P 

97-5-AKQ8754-AK2 

16 

2C, ID 

ID 

ID 

96542-AQJ2-AKQ7 

16 

IS, ID 

IS 

1C 

AJ75-4-KJT6-AKT8 

16 

ID, 1C 

1C 

ID 

A6-QJ86-J94-KQJ9 

14 

1H, 1C 

1C 

1C 

T8-AT5-K976-A874 

11 

IN, P 

1C 

1C 

KQ-QJ7642-QJ92-T 

11 

P, 1H 

1H 

1H 

K87-J8-AKQ8-AKQ2 

22 

1C, 2N 

U 

2N 

KT84-KQT7-K63-T2 

11 

P, IS 

P 

P 

AJ-Q64-J42-AKJ85 

15 

IN, 1C 

1C 

1C 

A632-3-96-KQJ863 

10 

P, 1C 

1C 

P 

965-AT8762-J7-A9 

9 

1H, P 

p 

P 

Q7-A74-A2-KQJ972 

16 

ID, 1C 

1C 

1C 

3-T6-KT74-AQJT74 

10 

P, 1C 

p 

p 

AKJ3-982-J85-KQ5 

14 

IS, 1C 

IS 

1C 

92-KQ6-A93-AQJ73 

16 

IN, 1C 

1C 

1C 

94-KQ3-AQT2-J632 

12 

P, ID 

ID 

ID 

-A874-AJT52-AQT5 

15 

1C, ID 

ID 

ID 

2-JT86-AQT9864-Q 

9 

P, 3D 

U 

P 


( Continued) 
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Table Bl. Continued. 


(S)-(H)-(D)-(C) 

Points 

Expert 

bids 

1 -level 
network 

2-NT 

network 

K95-94-A852-AT83 

ii 

P, ID 

p 

P 

Q-A743-AJ5-A8732 

15 

1H, 1C 

1C 

1 H 

-A642-AKT83-J943 

12 

1H, ID 

1H 

P 

AJ76-AQ73-75-J87 

12 

IS, 1H 

IS 

IS 

AQJ73-KT85-42-72 

10 

1S,P 

P 

IS 

KQ98754-83-Q7-A2 

11 

IS, 2S 

IS 

2S 

QJ6-Q-KT864-A854 

12 

ID, P 

ID 

ID 

743-Q65-AKT7-KQ9 

14 

1C, ID 

IN 

P 

J4-AJ654-J53-KJ9 

11 

P, 1H 

P 

P 

K8653-3-7-AKJ753 

11 

3C, 1C 

1C 

IS 

AT5-AK-Q9532-KQ3 

18 

IN, ID 

u 

IN 

A3-AT2-KT64-AKQJ 

21 

2C, 1C 

u 

IN** 

75-A3 - AQT632-AK5 

17 

IN, ID 

ID 

ID 

K53-QJ3-QJ8753-3 

9 

3D, P 

U 

2D 

Q-KQ65-AKQ74-Q43 

18 

IN, ID 

1H 

ID 

A7-K8643-K863-A8 

14 

IS, 1H 

1H 

1 H 

K-AJ8-AK965-QT87 

17 

IN, ID 

U 

IN 

A943-Q432-Q52-A2 

12 

1C, P 

P 

IS 

K87 - AT9-J 87-A972 

12 

1C,P 

P 

1C 

JT86-AK43-76-AQ7 

14 

IS, 1H 

IS 

IS 

982-Q652-AKQT-K5 

14 

1H, ID 

ID 

1 H 

KQJT85-A96-743-8 

10 

3S,2S 

P 

2S 

AT982-QT2-A3-J65 

11 

1S,P 

IS 

IS 

AJ32-AQ93-5-AQJ2 

18 

2C, 1C 

u 

IN** 

76-AKQJ653-7-KQ9 

15 

1H, 4H 

1H 

1H 

5-AJT8753-K4-K93 

11 

3H, 1H 

1H 

1H 


** These bids are incorrect. 
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Abstract. Recently, efficient scheduling algorithms based on Lagrangian re¬ 
laxation have been proposed for scheduling parallel machine systems and job 
shops. In this article, we develop real-world extensions to these scheduling 
methods. In the first part of the paper, we consider the problem of scheduling 
single operation jobs on parallel identical machines and extend the methodology 
to handle multiple classes of jobs, taking into account setup times and setup 
costs. The proposed methodology uses Lagrangian relaxation and simulated 
annealing in a hybrid framework. In the second part of the paper, we consider 
a Lagrangian relaxation based method for scheduling job shops and extend it 
to obtain a scheduling methodology for a real-world flexible manufacturing 
system with centralized material handling. 

Keywords. Jobshop scheduling; Lagrangian relaxation; simulated annealing; 
flexible manufacturing systems; weighted tardiness; subgradient methods. 


1. Introduction 

The problem of scheduling arises in situations where scarce resources have to be optimally 
allocated to activities over time. Most scheduling problems belong to the class of NP hard 
combinatorial optimization problems. Any scheduling methodology should aim to (Luh 
etal 1990) 

(i) generate efficiently near optimal solutions with measurable performance, 

(ii) facilitate rapid “what if’ analysis to examine the impact of dynamic changes, and 

(iii) support efficient methods for schedule reconfiguration to accommodate these changes. 
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In the area of discrete activity scheduling, it is generally accepted that a gap exists be¬ 
tween scheduling theory and practice. Practical methods react to dynamic changes without 
the ability to produce good solutions and theoretical methods produce good schedules 
without the ability to react to dynamic changes. Recently, Luh et al (1990) and Hoit- 
omt et al (1990, 1993) have developed a Lagrangian relaxation based suboptimal algo¬ 
rithm for scheduling of non-preemptive single/multi-operation jobs on parallel identical 
machines and for job shop scheduling. Their method performs very well in a wide va¬ 
riety of scheduling situations and is also amenable to carrying out extensive “what-if” 
analysis. 

In this paper, we consider the above scheduling methodologies for parallel identical 
machines (Luh etal 1990) and for jobshops (Hoitomt etal 1993) and extend these to take 
into account real-world features. 

1.1 Extension to a multiclass environment 

The scheduling methodolgy for parallel identical machines, developed by Luh et al (1990) 
does not take into account setup times and setup costs that are very important in multi¬ 
class manufacturing system scheduling. The first part of our work attempts to extend the 
scheduling methodology to multiclass production systems comprising parallel identical 
machines and taking into account setup times and setup costs. 

In a multiclass production setting, the jobs are divided into a number of mutually ex¬ 
clusive part types. Setup operations are an important feature of such production environ¬ 
ments. A significant setup time is incurred when a machine changes from processing one 
type of parts to a different type of parts. The setup time generally includes times for fix- 
turing tool changing and preparing the workplace. Thus, a setup cost is incurred, since 
the setup operations do not contribute to productivity. To minimize the setup times and 
costs, a batch of products belonging to the same part type is manufactured after a single 
setup. Large batch sizes, on the other hand, result in high inventory levels. The economic 
lot sizing problem (ELSP) (Fleishmann 1990) addresses this problem of minimizing the 
sum of inventory and setup costs. The problem is known to be NP hard (Lawler et al 
1989). 

The extension proposed here follows a hybrid approach that combines the techniques of 
Lagrangian relaxation (Fischer 1973, 1981; Luenberger 1984) and simulated annealing 
(Kirkpatrick et al 1983; Aarts & Van Laarhoven 1985; Van Laarhoven et al 1992). The 
objective is to minimize the sum of the total weighted tardiness and setup costs (assumed 
to be a monotonically increasing function of the setup times). 

1 .2 Extension to an FMS 

Scheduling is an important issue in the planning and operation of flexible manufactur¬ 
ing systems. The paper by Hoitomt et al (1993) proposes a Lagrangian relaxation-based 
scheduling methodology for job shops. However, it cannot be applied directly to the 
scheduling of a typical FMS. In the second part of this paper, we describe the scheduling 
problem of a particular flexible manufacturing system and develop an extension of the ap¬ 
proach presented by Hoitomt etal (1993) to schedule jobs on the machines and the material 
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handling equipment of the given FMS. The objective is to minimize the total quadratic 
weighted tardiness of the schedule. The problem is believed to be NP hard and the effort 
here is to only design an efficient suboptimal algorithm with performance measured with 
the help of a lower bound. 

1.3 Organization of the paper 

The next section is a survey of the relevant literature and is in two parts. Section 2.1 deals 
with the scheduling of jobs in a single class production environment as described by Luh 
etal( 1990). It essentially summarises the integer programming formulation of the schedul¬ 
ing problem and the solution methodology. Section 2.2 describes the job-shop scheduling 
problem and presents the Lagrangian relaxation approach of Hoitomt et al (1990, 1993). 
Section 3 proposes a hybrid methodology to a multiclass parallel identical machine prob¬ 
lem with setup times included. The proposed methodology employs simulated annealing 
and Lagrangian relaxation. Three examples are discussed to demonstrate the working of 
the proposed methodology and detailed numerical results are provided. Section 4 describes 
a particular FMS, and describes our extended methodology for scheduling the resources 
of the FMS. Numerical results are also provided. Section 5 presents conclusions and di¬ 
rections for future work. 

2. Scheduling methods based on Lagrangian relaxation 

2.1 The case of parallel identical machines 

Lagrangian relaxation (Fischer 1973, 1976,1981; Luenberger 1984) provides an efficient 
way of scheduling independent jobs with due dates on identical parallel machines. The 
special integer programming formulation facilitates the application of the Lagrangian re¬ 
laxation technique. Decomposition of the dual problem serves to simplify solution at the 
lower level. The high level problem is solved via a subgradient method. Dynamic changes 
can easily be accommodated in this approach. In this section, we provide a review of the 
Lagrangian relaxation technique as applied to scheduling of non-preemptive, single oper¬ 
ation jobs on parallel identical machines. The material is mostly taken from the paper by 
Luh et al (1990). 

2.1a Problem formulation An integer programming formulation as described by Luh 
et al (1990) is a common way to represent a scheduling problem. The following is a static, 
discrete time, integer programming formulation of the scheduling problem. We shall use 
the following notation. 

N total number of jobs, 
i index of jobs, t = 1,2,..., iV, 

K time horizon under consideration, 
k index of time, k = 1, 2,..., K, 

Wi weight of job i, 
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ti processing time of job i, 

Dj due date of job i, 

Mk number of machines available at time £, 
bi beginning time of job i, 
ci completion time of job i, 

8ik integer variable, equals 1 if job i is active at time k, and 0 otherwise, 

J objective function to be minimized. 

Among the above variables, the number of jobs N, time horizon K , weights of jobs 
{ro,}f=i, time requirements {ti}^L j, due dates {Aj/Li and machine availability {M k }f _j 
are assumed to be given. Also the job processing is non-preemptive so that a contiguous 
block of time length ti is needed to process job i. The decision variables are {bi }^,. Once 
the b(S are selected, {c,}^_j, {7}}^_j and {8nc}^lf k=l can easily be derived. For example, 
for bi — 2, ^ =3, and A = 3, we have 8a = 8a = A '4 = 1, c, = bi + f/ — 1 = 4, and 
Ti = ct — Di - 1. We also assume for the sake of simplicity that all jobs are available 
for processing at time 1 (this can easily be relaxed) and that the time horizon K is long 
enough to complete all the jobs. 

The objective function of interest is 

7 = ]>>,• 7). (1) 

i 

Such an objective function accounts for the weight of jobs, the importance of meeting 
due dates, and the fact that completing a job becomes critical with each time unit after 
passing its due date (Luh et al 1990). A static and deterministic parallel machine scheduling 
problem can now be formulated as follows. 


P : min J = Y'' 10,7), 

bi “ 

(2) 

subject to capacity constraints 


Yl 8 ik<M k (£ = 1,2. K ), 

i 

(3) 

and processing time constraints 


ct-bi + \ = ti (i = 1,2,..., N). 

(4) 


Note that in (4), adding 1 to c t — bi is required to obtain ti in view of the definitions of bi 
and Cj. 

The single machine sequencing problem can be solved as a weighted bipartite matching 
problem that is NP hard (Lawler et al 1989). Consequently, the parallel machine weighted 
tardiness problem is also NP hard. The additivity of the objective function facilitates the 
decomposition approach. 

2.1b Solution methodology Relaxing the capacity constraints (3) using Lagrangian 
multipliers Ttk (k = 1,2,..., K) to form the relaxed problem, 
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R : min 

bi 


L i k 

subject to (4), the dual problem is 


E-iii+E ttk ( Sit - M k 


D : max L, 

ji 


with 


L = - jt k M k + min £ 7) + J2 Jtk&ikj - 


subject to 


( 5 ) 

( 6 ) 

( 7 ) 


n > 0. (8) 

This leads to the following decomposed subproblems for each job i (given n). 


R; : min L;, 

l<bi<K-ti+\ 


( 9 ) 


with 


Li — T( + y ' TCfcSikJ , ( 10 ) 

subject to 

C'i — bi + 1 = t[. ( 11 ) 

K is assumed to be large enough to complete all the jobs. 

For convex programming problems, the maximum of the lag (dual cost) equals the mini¬ 
mum of the original objective function and a saddle point exists. However, there are several 
difficulties in utilizing this technique for solving discrete variable problems. First, the sad¬ 
dle point may or may not exist and it may be difficult to determine when the algorithm has 
terminated. Second, even if the dual optimum were obtained, the corresponding schedule 
at that point may not be feasible. Heuristic adjustment is generally required to ensure that 
the once relaxed constraints are obeyed. Therefore, the various steps to obtaining a near 
optimum solution are 


(1) solving the subproblems, 

(2) solving the dual problem, 

(3) constructing a feasible solution, and 

(4) finding a (sub) optimal solution. 

Each of these steps is discussed by Luh et al (1990). 

The optimized Lagrangian multipliers Tt k are interpreted as a shadow price for using 
the resource (machine) at k. Therefore, they reflect the sensitivity of the objective function 
with respect to resource levels. This can be used to provide answers to “what if’ questions 
and to reconfigure an existing schedule when changes occur in resource availability. Thus, 
Lagrangian relaxation has the ability to react effectively to dynamic changes and at the 
same time produce good suboptimal schedules. 
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2.2 The case of job shops 

A discrete time, integer programming formulation of the scheduling problem is given 
below. The variable definitions, and the constraint statements are all influenced by the 
work of Hoitomt et al (1990,1993). Let (/, j ) refer to the jth operation of the zth job. 

N total number of jobs, 

i index of jobs, z' = 1,2. N, 

K time horizon under consideration, 
k index of time, k = 1,2,..., K, 
wi weight if job z, 

Nj number of operations of job z, 
j index of operation, j = 1,2,..., zV/, 
tij processing time of (z, j), 

Di due date of job z, 

H number of machine types, 
h index of machine type, h = 1,2,..., H, 
mij machine chosen to process (z, j), 
btj beginning time of (z, j), 
ctj completion time of (z, j), 

Sijk integer variable equals 1 if (z, j ) is active at time k, 0 otherwise, 

Cj completion time of job i, 

Mich number of machines of type h available at time k, 
hj type of machine processing operation j, 

Ti tardiness of job z = max (0, Q — Di), 

J objective function to be minimized. 

The precedence constraints of every job form a simple directed acyclic graph and C,- = 
c^, for all z. Alljobs are assumed to be available for processing from time 1 (notacrucial 
requirement). K is assumed to be large enough to complete all the jobs. 

Among the above variables, the number of jobs N, time horizon K, weights of jobs 
time requirements {fy}^fy =1 , due dates {A'jfLi an d machine availability 

{^kh)ic=i h =l are assumed to be given. Also the job processing is non-preemptive so that 
a contiguous block of time length Zy is needed to process (z, j). The decision variables are 

{%}/=ij=r ° nce die bij’s are selected, {cij}^‘ j=v {Ti}fL x and {8ijk}^l X j^ hkz=1 can 
easily be derived. The objective function of interest is 
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The above function accounts for the weight of the jobs, the importance of meeting due 
dates and the fact that a job becomes more critical with each time after passing its due 
date. Compare the function here with the objective function in (1). Whether we choose 
u>iTi or choose w iTi 2 does not alter either the formulation or the solution method¬ 

ology, but only helps explore different weightages to the individual tardiness values. A 
static and deterministic parallel machine scheduling problem can now be formulated as 
follows. 


P : min J = t, 

hi : 


b u 

subject to capacity constraints 

< Mkk (k = 1, 2,..., K ; h = 1, 2, H), 

i 

processing time constraints 

c ij - bij + 1 = tij (i = 1,2. N\ j - 1,2,..., Ni), 

and precedence constraints 

Cij + 1 < bijj+i) (i = 1,2,..., N; j = 1, 2..., N t - 1). 


(13) 

(14) 

(15) 

(16) 


The complexity of the above scheduling problem motivates a decomposition approach. 
Hoitomt et al (1993) propose a Gauss-Seidel method based on quadratic penalty terms for 
precedence constraints (16), since relaxing those constraints with Lagrangian multipliers 
alone would cause oscillations in the values of the multipliers relaxing them and therefore 
the beginning times bij . The oscillation phenomenon is due to nondecomposability of J 
with respect to the operations of each job. 


3. Multiclass jobs on parallel identical machines 

In a multiclass production system, switchover times or setup times can have a significant 
effect on the way parts are scheduled. The jobs of a given part type need not be processed 
together. It is desired to find a schedule that m inimizes the sum of the weighted tardiness 
and switchover costs. 

Several complications arise with the introduction of switchover times. The Lagrangian 
relaxation technique of Luh et al (1990) cannot be directly applied because 

e For every job j, we now need to evaluate L*j, and h* where i is the part type of the 
job that was processed immediately before j ( j = 1,2..., Af); (i = 1,2..., jP) 
where P is the total number of part types. 

• Designing an effective greedy heuristic to arrive at near-optimum feasible schedule 
at the termination of the subgradient algorithm is not easy. 

To circumvent this, a hybrid approach that makes use of simulated annealing (Van 
Laarhoven et al 1992) to arrive at a near optimal sequence of setup operations and 
Lagrangian relaxation to arrive at the schedule of jobs of a part type on the machines 
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is developed. The assumption here is that once the machines are set up for a part type, all 
jobs belonging to the part type are processed. The following simplifying assumptions are 
made regarding switchover times and costs. 

(1) The switchover times are the same for all classes. 

(2) The setup costs depend only on the setup times and further, are a monotonically 
increasing function of the setup times. 

Defining the state of a machine at time t to be the type of part it is processing at t, the 
extra data necessary are the initial states of the machines and the time instants at which 
each machine first becomes available. 

First, we describe a method to arrive at the schedule of parts of a particular type on 
the machines. L et Q denote the total number of machines. An upper bound on the plan¬ 
ning horizon K for scheduling jobs belonging to class / is given by Yiiel ^ (if ah jobs 
are scheduled on a single machine). Let u; denote the time instant at which machine i 
(i = 1 , 2, ..., Q) first becomes available (after necessary setup operations). Let the per¬ 
mutation (Si, Si, •. •, Sq ) denote the sequence of machines such that us, < v$ 2 ■ ■ ■ < vs Q . 
Determine q — j such that max y [vs ; . < us, + Kf, machines S q+ 1 ,..., Sq cannot pro¬ 
cess any jobs belonging to the part type under consideration. For k = 1, 2..., K, form 
Mk based on us,,..., us„, where n = 1, 2,..., Q. It is here that the second assumption 
regarding setup costs becomes important. If two or more machines become available at 
the same time, any machine can be chosen for processing the jobs belonging to the part 
type thus preventing unnecessary enumeration at this stage. Use Lagrangian relaxation 
to arrive at the schedule of jobs and cost for each n. Each of these tasks is paralleliz- 
able. The schedule for which die sum of the setup cost and tardiness cost is minimum is 
chosen and the availability of the machines and states of the machines are accordingly 
updated. 

To determine the order of the part types, higher level simulated annealing optimization 
is carried out. The simulated annealing process will give us the order in which to process 
the part types, taking into account the setup times and setup costs. Having obtained the 
order of part types, the schedule on each machine and cost is computed using the method 
discussed in the previous paragraph. It can easily be shown that in the global optimum 
schedule, jobs belonging to the same part type and having the same processing times and 
due dates have to be processed in the decreasing order of their weights (Srigopal 1994). 
These can be reordered to yield a lower cost at the termination of the algorithm. 

3.1 Numerical results 

The examples discussed here are the multiclass versions of the ones appearing in Hoitomt 
et al (1990, 1993) and Luh et al (1990). 

3.1a Example 1 There are 12 jobs belonging to 4 part types (call the part types A, B, 
C, and D), shown in table I. They are to be scheduled on 2 machines that are available 
from time instant 1. Initial state of M\ is given to be A and that of M 2 to be B. The setup 
times and setup costs are shown in table 2. 
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Table 1. Job data for example 1. 


i 

Wi 

U 

D t 

Class 

i 

Wi 

ti 

Di 

class 

1 

2 

4 

41 

A 

2 

2 

4 

41 

A 

3 

2 

4 

61 

A 

4 

2 

2 

71 

B 

5 

2 

2 

46 

B 

6 

2 

3 

31 

C 

n 

t 

2 

3 

31 

C 

8 

2 

3 

31 

C 

9 

2 

3 

36 

C 

10 

2 

3 

56 

C 

11 

2 

1 

26 

D 

12 

2 

1 

61 

D 


Table 3 shows the schedule obtained after 16 iterations. The cost of the above schedule is 
274 units, out of which setup costs account for 200 units and tardiness costs equal 74 units. 


3.1b Example 2 There are 25 jobs belonging to 7 part types (call these A, B, C, D, E, 
F, and G), shown in table 4. Table 5 shows the setup times and setup costs. The jobs are to 
be scheduled on 4 machines that are available from time instant 1. The initial state of M\ 
is given to be A, M 2 is B, M 3 is C and M 4 is D. The schedule obtained (after 49 iterations) 
is shown in table 6 . The cost of the (suboptimal) schedule is 930 units. Tardiness cost is 
130 units and the rest are setup costs. 


3.1c Example 3 Eighty-nine jobs belonging to 15 part types are to be scheduled on 
10 machines (see table 7). The first five machines are available from the beginning of the 
planning horizon and the next five are available from time instant 10. Initial states of the 
machines 1,2,..., 10 are A, B,..., J respectively. Table 8 shows the setup times and the 
setup costs. The detailed schedule, obtained after 225 iterations, is shown in table 9. The 
cost of the schedule is 4749 units out of which 1199 units are tardiness costs and the rest 
are setup costs. 


Table 2. Setup times and setup costs. 


Job class 

A 

B 

c 

D 

Setup time 

40 

20 

30 

10 

Setup cost 

200 

100 

150 

50 


Table 3. Schedule obtained for ex¬ 
ample 1. 


Mi 1 2 3 11 12 

M 2 5 4 6 7 8 9 10. 
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Table 4. Job data for example 2. 


i 

Wi 

n 

Di 

Class 

i 

Wi 

u 

A 

Class 

i 

6 

4 

21 

A 

2 

2 

4 

21 

A 

3 

5 

4 

101 

A 

4 

2 

5 

61 

B 

5 

8 

5 

101 

B 

6 

2 

8 

61 

C 

7 

2 

8 

41 

C 

8 

5 

8 

76 

C 

9 

2 

8 

126 

C 

10 

1 

2 

61 

D 

11 

2 

2 

20 

D 

12 

2 

2 

66 

D 

13 

1 

2 

101 

D 

14 

6 

2 

126 

D 

15 

2 

6 

126 

E 

16 

2 

7 

61 

F 

17 

2 

7 

126 

F 

18 

2 

7 

176 

F 

19 

2 

7 

76 

F 

20 

2 

7 

101 

F 

21 

2 

7 

101 

F 

22 

2 

7 

151 

F 

23 

2 

7 

151 

F 

24 

2 

3 

176 

G 

25 

2 

3 

76 

G 







Table 5. Setup times and setup costs for example 2. 

Job class 

A 

B 

C 

D 

E 

F 

G 

Setup time 

40 

50 

80 

20 

60 

70 

30 

Setup cost 

200 

250 

400 

100 

300 

350 

150 


4. Scheduling an FMS with centralized material handling 

4.1 Architecture of the FMS 

The FMS under study consists of three identical numerically controlled machines (NCs), 
a rail-guided vehicle (RGV), a pallet pool (PP), a pallet preparation area (PPA), and a tool 
preparation area (TPA). Each fixture is first prepared in the PPA (mount operation). It is 
then taken to the NCs for machining that requires a predetermined amount of time. The 
fixture is then unmounted in the PPA and the pallet is released back into the PP. Assume 
the NCs and the PPA to have infinite buffer capacity. Periodically, the RGV feeds the tool 


Table 6 . Schedule obtained for example 2. 


Mi 

1 

2 

3 

15 



m 2 

4 

5 

24 

25 



m 3 

7 

6 

8 

9 




10 

11 

12 

13 

14 

16 


17 

20 

21 

18 

22 

23 
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magazine of the NCs with tools prepared in the TPA and is unavailable for the transit 
operations. The travel times between the PP and PPA, the PPA and the NC centres, and 
the TPA and the NCs are all given. However, these times are applicable only if the RGV 
is loaded. In other words, if the RGV is free, it takes negligible amount of time to reach 
any of the facilities. 

Every job (pallet) therefore, undergoes seven operations. 

(1) Travel from the PP to the PPA. 

(2) The mount operation in the PPA. 

(3) Travel from the PPA to the NC centres. 


Table 7. Job data for example 3 


i 

Wi 

u 

D t 

Class 

i 

Wi 

u 

Di 

Class 

i 

1 

1 

1 

A 

2 

9 

1 

121 

A 

3 

1 

1 

131 

A 

4 

5 

i 

11 

A 

5 

1 

1 

11 

A 

6 

1 

i 

11 

A 

7 

1 

1 

-29 

A 

8 

1 

i 

71 

A 

9 

1 

1 

71 

A 

10 

1 

i 

71 

A 

11 

1 

1 

71 

A 

12 

1 

i 

71 

A 

13 

1 

1 

81 

A 

14 

1 

i 

81 

A 

15 

6 

1 

21 

A 

16 

1 

i 

21 

A 

17 

1 

1 

21 

A 

18 

1 

i 

51 

A 

19 

9 

1 

-9 

A 

20 

1 

i 

-9 

A 

21 

1 

1 

101 

A 

22 

1 

i 

101 

A 

23 

1 

1 

101 

A 

24 

1 

i 

91 

A 

25 

1 

2 

-19 

B 

26 

1 

2 

151 

B 

27 

1 

2 

151 

B 

28 

1 

2 

1 

B 

29 

1 

2 

1 

B 

30 

1 

2 

131 

B 

31 

1 

2 

-9 

B 

32 

1 

2 

-9 

B 

33 

1 

2 

-9 

B 

34 

1 

2 

81 

B 

35 

1 

2 

71 

B 

36 

1 

2 

71 

B 

37 

1 

2 

71 

B 

38 

1 

2 

41 

B 

39 

1 

2 

11 

B 

40 

1 

2 

11 

B 

41 

1 

2 

11 

B 

42 

1 

3 

251 

C 

43 

1 

3 

241 

C 

44 

1 

3 

201 

C 

45 

1 

3 

21 

C 

46 

1 

3 

21 

C 

47 

1 

3 

21 

C 

48 

1 

o 

J 

21 

C 

49 

1 

3 

21 

C 

50 

1 

3 

21 

C 

51 

1 

3 

21 

C 

52 

1 

3 

71 

C 

53 

1 

3 

71 

C 

54 

1 

3 

91 

C 

55 

1 

3 

91 

C 

56 

1 

3 

91 

C 


Continued 






426 


Y Narahari and R Srigopal 


Table 7. Continued. 


i 

U)i 

ti 

Di 

Class 

i 

Wi 

ti 

Di 

Class 

57 

1 

3 

51 

C 

58 

1 

3 

11 

C 

59 

6 

3 

121 

C 

60 

1 

3 

101 

c 

61 

1 

3 

1 

c 

62 

1 

3 

1 

c 

63 

1 

3 

31 

c 

64 

1 

4 

181 

D 

65 

1 

4 

1 

D 

66 

9 

4 

11 

D 

67 

1 

4 

91 

D 

68 

1 

4 

41 

D 

69 

1 

4 

151 

D 

70 

6 

5 

191 

E 

71 

1 

5 

51 

E 

72 

1 

5 

51 

E 

73 

16 

5 

21 

E 

74 

6 

5 

101 

E 

75 

16 

6 

71 

F 

76 

1 

7 

31 

G 

77 

6 

8 

601 

H 

78 

1 

8 

111 

H 

79 

1 

8 

101 

H 

80 

1 

9 

81 

I 

81 

9 

10 

91 

J 

82 

1 

10 

201 

J 

83 

1 

10 

201 

J 

84 

1 

11 

111 

K 

85 

6 

12 

91 

L 

86 

1 

15 

421 

M 

87 

1 

16 

241 

N 

88 

1 

16 

.171 

N 

89 

1 

20 

241 

O 







(4) Undergo machining in one of the NC machines. 

(5) Travel back to the PPA for the unmount operation. 

(6) The unmount operation at the PPA. 

(7) Travel back to the PP. 

For the sake of convenience, we shall re-label the RGV to be facility 1, PPA to be facility 
2, and the three NCs to be facilities 3, 4, 5 respectively. We also find it useful to label the 
RGV as machine type 1, PPA as machine type 2, and the three NC machines as machine 
type 3. 


Table 8. Setup times and setup costs for example 3. 


Class 

A 

B 

C 

D 

E 

F 

G 

H 

Time 

10 

20 

30 

40 

50 

60 

70 

80 

Cost 

50 

100 

150 

200 

250 

300 

350 

400 

Class 

I 

J 

K 

L 

M 

N 

O 


Time 

90 

100 

110 

120 

150 

160 

200 


Cost 

450 

500 

550 

600 

750 

800 

1000 
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Table 9. Schedule obtained for example 3. 
Mi 84 


m 2 

25 

28 

29 


38 

37 

34 

m 3 

61 

62 

58 


53 

54 

55 

m 4 

65 

66 

68 

m 5 

71 

72 

74 

m 6 

75 

19 

15 


16 

24 

8 


11 

23 

14 

m 7 

76 

85 


Mg 

78 

77 

79 

m 9 

80 

86 


Mio 

82 

81 

83 


31 

32 

33 

39 

27 

30 

26 


48 

49 

50 

45 

56 

57 

43 

60 

67 

69 

64 

87 

73 

70 

89 


4 

5 

6 

20 

22 

3 

13 

10 


40 

41 

35 

36 

51 

46 

47 

63 

44 

88 

42 

52 

59 

17 

7 

1 

18 

12 

9 

2 

21 


4.2 A new formulation 


The job shop scheduling methodology of Hoitomt et al (1990, 1993) was implemented 
for the above problem and an oscillation phenomenon was found to persist despite adding 
quadratic penalty terms. Therefore, a slightly different problem formulation as shown 
below is employed. Define-the following additional variables. 

coijk integer variable, equals 1 for every time unit k < Cjj and 0 otherwise. 

o'ijk integer variable, equals 1 for every time unit k > bij and 0 otherwise. 

With these new variables, the precedence constraints (16) can be replaced by the fol¬ 
lowing: 

“>ijk + <ti(j+i)k < 1 (i = 1, 2,.. ., N; j = 1, 2,..., 6; k = 1, 2, ..., K). 

(17) 


With (17) in place of (16), the problem formulation is still valid and standard Lagrangian 
relaxation employing subgradient optimization can be employed. 

Now, we relax the capacity constraints (14) using Lagrangian multipliers itkh(.k = 

1, 2. K ; h = 1,2, 3) and relax the precedence constraints (17) using Lagrangian 

multipliers iXijkii = 1,2,..., N‘, j = 1,2,..., 6; k = 1,2,..., K ) to form the relaxed 
problem, 

i?:minV, (18) 

b U 


with V given by: 


E 


w 


'itf + ]C fajk + Vi(j+\)k ~ 1) + nkh i 

],k j k=b;j 


TxkhMkh, 

kh 


( 19 ) 
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subject to (15). Then the dual problem is 
D : ir 

71 

with L given by 


D : max L , 

TZ.fJL 


mm 




Wi 


( 20 ) 


i' 7 / 2 + Ijk (®ijk + CTi(j+l)k - 1) + X] nkh J 
j,k j k=bij 

y v ^kliMkh j 
kh 

subject to 

tt > 0, p > 0\ 

This leads to the following decomposed subproblems for each job i (given n, p) 


Di : maxLfOu,;, n), 


( 21 ) 


( 22 ) 


(23) 


with 


/ k—c, 


I l J iV 

A' = — X] flijk + min wiT 2 + y ( J2 [Xijk + Mi(;-i)ifc 

M °>j L j \ k =l k=b UH i) 


-j 

+ Y*kk, 


(24) 


subject to (15). bij is not a linear function of the multipliers (pi/k) and therefore the 
oscillation phenomenon disappears. 

The various steps to obtaining a near optimum solution (Hoitomt et al 1990, 1993; Luh 
etal 1990) are 


(1) solving the subproblems, 

(2) solving the dual problem, 

(3) constructing a feasible solution and 

(4) finding a (sub) optimal solution. 


4.2a Scheduling individual operations The scheduling of (ii, j ) becomes the selection 
of optimal beginning times bij's. To do this L t is computed for each possible value of 
bij and of these b*j is the one yielding the lowest value of L,-’s. This selection again is 
decomposable by operations. Determination of b*, can be parallelized. 

4.2b Solving the dual problem A subgradient method is used to solve the dual problem. 
The multipliers n and p are updated according to 

n n+l =jr n +a n g(n n^ 
p n+l =p n +a n 2 g{p n ). 


(25) 

(26) 
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where n is the iteration index, gkhj (n n ) is the khj th component of the subgradient of D; 
with respect to n and equals Ylij &ijk — Mkhj and a " is the step size at the nth iteration. 
Similarly, gijk(p n ) = IMjk + °i{j+\)k ~ 1. We have 

a” = X{L - L n )/ | g{n n ) | 2 , (27) 

aj =X(L — L n )/ | | 2 , (28) 

where L is an estimate of the optimal solution and L n is the value of the optimal solution 
at the nth iteration. The method converges at the rate of geometric progression. X is halved 
whenever L n fails to increase in some fixed number of iterations. The subgradient algorithm 
terminates when the step size a" and remains small for a fixed number of iterations 
while L n is not increasing. 

4.2c Construction of a feasible schedule Because of the stopping criterion used, the 
solution in the dual space is generally associated with an infeasible solution, viz. some of the 
capacity' constraints (14) and precedence constraints (17) may be violated. The processing 
time constraints are always satisfied. In the optimal dual solution, each operation is uniquely 
associated with a beginning time b*j. 

The dual solution is first modified to ensure that the precedence constraints are satisfied. 
This is done by pushing all bij which violate precedence constraints forward in time, start¬ 
ing with the second operation of each job. A list JJ is then created by arranging operations 
of all jobs in the ascending order of the modified operation beginning times. Operations are 
scheduled on the required machines as they become available. If the capacity constraint for 
a particular machine type is violated, a greedy heuristic based on the incremental change in 
J determines which operations should begin at that time and which ones are to be delayed 
by one time unit. 

Operations in U are ordered in such a way that if (t, j) is before another operation (u , v), 
then 

(1) bij < b uv , 

(2) if bij = b uv . then /(/, j) > f(u, v). 

The incremental cost function for job i is defined as 

f(i,j) = Wi[(Ti + l) 2 -T?]. (29) 

Additionally, let n be a time index tracking machine availability, and let £ be a set of 
unscheduled operations which cannot be scheduled between time n and their respective 
beginning times bij. Given the sequence U and {M*&}, the greedy algorithm works as 
follows. 

Step Oi Set £ = 0 and go to step 1. 

Step 1: Determine the job and operation indices i and j of the first operation in U. Deter¬ 
mine hj ; Set b = bij. 

Step 2: Determine the first time / such that M/a - > 0. Set n=l, and go to step 3. 

Step 3: If Mihj # 0 for / = n, n + 1,..., n + tij — 1, go to step 4. Otherwise go to step 5. 
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Table 10. Job data for the FMS. 


i 

Wi 

ti 

Di 

i 


u 

A 

i 

1 

6 

11 

2 

i 

10 

21 

3 

2 

12 

1 

4 

i 

48 

41 

5 

2 

72 

51 

6 

3 

12 

101 

7 

3 

48 

51 

8 

3 

48 

61 

9 

3 

48 

71 

10 

1 

48 

1 


Step 4: If the precedence constraints related to preceding operations are not violated, set 
Mihj = Mihj — 1 for Z = n, n + 1,..., n + ty — 1 and go to step 8. Otherwise go to 
step 5. 

Step 5: Set n = n + 1. If n > b, go to step 6; Otherwise go to step 3. 

Step 6: If operation 2 on list U has beginning time b, then set set sequence!/ = U—{(i, j)}, 
re-index the sequence, and set E = E U {(/, j)}, and go to step 1; otherwise go to 
step 7. 

Step 7: For any unscheduled operations (i,j) 6 E such that bij < b, modify bij = b + 1; 
check all subsequent operations of job i and reset those beginning times which violate 
precedence constraints; set U = U U E; and reform sequence U\ set E = 0 and go to 
step 1. 

Step 8: Set U = U — j)}\ if 17 = 0, stop; otherwise go to step 1. 

4.2d Performance evaluation Once a feasible schedule is obtained, the corresponding 
value of the objective function J is an upper bound on the optimal objective function J*. 
The value of the dual function L*, on the other hand is a lower bound on J*. An upper 
bound of the duality gap is thus provided by J — L* which is a measure of the suboptimality 
of the feasible schedule with respect to the optimal schedule. 

4.3 Numerical results 

Ten jobs had to be scheduled on the system under consideration (table 10). The travel times 
are given to be: 2 units from PP to PPA and vice-versa; 1 unit from PPA to NCs and vice- 
versa. Table 11 shows the detailed schedule obtained. The cost of the schedule is 50435 
units. The best lower bound that could be obtained (by tuning the various parameters) was 
31084 units. Thus, the schedule is at most 60% more than the optimum. 

5. Conclusions 

In the first part of this work, a new hybrid scheduling algorithm that uses simulated an¬ 
nealing and Lagrangian relaxation has been proposed and tested for multiclass production 
systems consisting of identical parallel machines. The technique is found to work very 
well for many examples. However, the two key issues 
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(1) performance evaluation, and 

(2) schedule reconfiguration in the event of dynamic changes, 
have not been answered. Future work should concentrate on 

(1) answering questions regarding performance evaluation and schedule reconfiguration 
in the event of dynamic changes for multiclass production systems and, 

(2) extending the hybrid technique to job shop scheduling. 

In the second part of this work, a Lagrangian relaxation algorithm for job shops was 
extended to the scheduling of a real-world flexible manufacturing system. An interesting 
feature of the algorithm is that it is quite sensitive to the initial value of X. Also, it is 


Table 11. Schedule obtained for the FMS. 
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bij 

Cij 

m ij 
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bij 

c u 

mij 
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13 
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17 

2 
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3 

18 

is 
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i 

4 

24 
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3 

3 

11 
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3 
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24 
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26 
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8 
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Table 11. ( Continued .) 
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bij 

c u 

m ij 

i 

j 

bij 

c ij 

mij 
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8 
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1 
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1 

9 

2 
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1 

9 
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2 

9 

7 
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1 

10 

1 

1 

2 

1 

10 

2 

3 

5 

2 

10 

3 

6 

6 

1 

10 

4 

7 

54 

3 

10 

5 

55 

55 

1 

10 

6 

56 

56 

2 

10 

7 

57 

58 

1 


not monotonic. In other words, a higher value of the dual of the objective function at the 
termination of the subgradient optimization algorithm, does not necessarily yield a schedule 
that is better in quality. This is mainly attributable to the termination condition and the 
heuristic that is applied to arrive at a feasible schedule. Future work should concentrate 
on designing a monotonic Lagrangian relaxation algorithm that is insensitive to the initial 
values of X. At least it should provide an empirical rule for arriving at good lower bounds 
and better quality schedules. 
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Abstract. The problem addressed is one of model reference adaptive control 
(MRAC) of asymptotically stable plants of unknown order with zeros located 
anywhere in the 5-plane except at the origin. The reference model is also asymp¬ 
totically stable and lacking zero(s) at 5 = 0. The control law is to be specified 
only in terms of the inputs to and outputs of the plant and the reference model,. 
For inputs from a class of functions that approach a non-zero constant, the 
problem is formulated in an optimal control framework. By successive refine¬ 
ments of the sub-optimal laws proposed here, two schemes are finally designed. 
These schemes are characterized by boundedness, convergence and optimality. 
Simplicity and total time-domain implementation are the additional striking 
features. Simulations to demonstrate the efficacy of the control schemes are 
presented. 

Keywords. Model reference adaptive control; nonminimum phase; unknown 
order; optimal control. 


1. Introduction 

By ‘control’ of a system is meant the process of achieving the desired or close-to-desired 
performance from the system by manipulating it in some fashion. Often, some simplifying 
assumptions - linearity, time-invariance, etc. - of the system are made to make the problem 
more tractable. When the assumptions are too naive or when enhanced performance under 
varying conditions is required, relatively refined strategies are sought. The built-in capa¬ 
bility of a system for such refinement is variously termed adaptation, self-organization, 
learning and intelligence. An adaptive controller is one which is capable of reconfiguring 
itself for the ‘better’, based on its observation of the process as it unfolds. 

Two approaches - direct and indirect - have been reported in the literature on adaptive 
controllers (Astrom 1987). In the indirect approach, the plant is represented by an explicit 
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model, the unknown parameters of which are estimated by on-line system identification 
techniques (Ljung 1987). This requires the knowledge of the plant order or, at any rate, 
of an upper bound on the plant order. In the absence of knowledge of even a reason¬ 
able upper bound on the order, order estimation techniques are resorted to (Sliwa 1984). 
They are commonly built upon some heuristics, and hence are feasible only in situations 
where, inter alia, speed of adaptation and plant stability are not critical. In self-tuning reg¬ 
ulators, on-line tuning of the controller is achieved through certainty-equivalence. Alterna¬ 
tively, when no explicit effort is made to identify the plant parameters but the controller is 
directly adjusted to minimize the error between the plant and the reference model outputs, 
the approach is termed direct model reference adaptive control (MRAC). 

Stability of the direct MRAC algorithms is guaranteed in general under the following 
assumptions (Narendra & Annaswamy 1989; Sastry & Bodson 1993) 

(i) The plant is minimum phase (zeros restricted to the left half of the 5 -plane); 

(ii) the order or an upper bound on the order of the plant is known; 

(iii) the reference model is stable and is minimum phase; 

(iv) the sign of the high-frequency gain of the plant is known; 

(v) the input is persistently exciting. 

Assumption (iv) above has been relaxed for a minimum phase plant (Morse 1987). In a 
recent piece of work (Tao & Ioannou 1993), one of the basic assumptions in stable MRAC 
- that the relative degree of the modelled part of the plant is known exactly and is no 
greater than that of the reference model - is relaxed; the scheme there requires only an 
upper bound for the relative degree of the plant. 

An independent direction recently explored (Bar-Kana et al 1983; Sobel & Kaufman 
1987) for the direct adaptive control problem is the Command Generator Tracker (CGT) 
method. In this approach (Sobel & Kaufman 1987), the output error is used to compute 
the adaptive gains obtained as a combination of ‘proportional’ and ‘integral’ terms. The 
plant command is generated in terms of these adaptive gains. The reference trajectories 
which the plant output has to follow are limited to the class of outputs of a free-running 
LTI dynamical system. 

Strict positive real (SPR) property is pivotal in several adaptive control schemes. In 
the CGT approach, for instance, the closed-loop plant being SPR implies asymptotic sta¬ 
bility of the system, boundedness of the controller gains and asymptotic tracking. Being 
highly restrictive, the SPR condition is often relaxed. The plant must then be almost 
SPR (ASPR) (Narendra & Annaswamy 1989). If the plant, or an augmented version of 
it, is ASPR, then a suitable Lyapunov function can be constructed and stability ensured. 
Else, finding a suitable Lyapunov function cannot, in general, be taken for granted. It 
is seen that both direct MRAC and CGT approaches call for certain precise a priori 
structural information about the plant, the order information being one such. In prac¬ 
tice, systems frequently have orders much in excess of those assumed for identification. 
Even if identification were to be the ‘best’ with respect to the chosen model, the ef¬ 
fect of unmodelled dynamics of a linear plant cannot always be totally neglected for 
persistently exciting inputs. Indeed, all order estimation techniques are prone to be inex¬ 
act, based as they invariably are, on some heuristic. Despite the existence of rigorous 



Optimal tuning of nonminimum phase systems 


437 


stability proofs it has been demonstrated (Rohrs et al 1982, 1985) that several algo¬ 
rithms are non-robust in the presence of ‘small’ unmodelled dynamics. Hence there is 
a strong case to develop algorithms which are independent of information of the plant 
order. 

Persistency of excitation, a condition which can be met in practice only by artificial 
injection of probing signals, is attended with many problems. In several situations, it is 
simply infeasible to continually inject probing signals. Questions that naturally arise in 
this context are: Can persistency of excitation be done away with? If yes, what are the 
issues that need to be addressed as a consequence? What are the associated advantages 
and disadvantages? 

The plant being minimum phase is a rather restrictive, yet vital factor, for stability of 
adaptive systems as established. Interest in plants which are not necessarily minimum 
phase has also motivated research. Some of these studies are mentioned below. 

• A conceptual approach involving a bilinear parameter estimation problem has been pro¬ 
posed (Astrom 1980; Praly 1984). The procedure (Astrom 1980) is based on identifica¬ 
tion of an implicit plant model and pole-zero placement design. The estimation requires 
that at each t, a quadratic criterion be minimized (Praly 1984). The assumptions are 
stabilizability, knowledge of an upper bound on the system order and knowledge of an 
upper bound on the system parameters. 

• A class of discrete time systems inclusive of all minimum phase, all stable nonminimum 
phase and some unstable nonminimum phase systems has been considered (Goodwin 
et al 1981). Global stability is assured under the ‘key substantive’ assumption that the 
one-step-ahead optimal controller designed using the true (but unknown) system pa¬ 
rameters leads to a stable closed-loop system. Closed-loop stability is hypothesized. A 
modification of the one-step-ahead law to accommodate nonminimum phase zeros has 
been suggested (Hartley & Sarantopoulos 1991). Since the procedures involve the esti¬ 
mation of plant parameters, they assume an explicit form for the plant transfer function. 

• Loop Transfer Recovery (LTR) as applied to minimum phase plants (Doyle & Stein 
1979,1981) has been generalized to nonminimum phase plants (Zhang & Freudenberg 
1990). Necessary and sufficient conditions for a nonminimum phase plant to have a 
recoverable target loop are arrived at (Chen et al 1992). These procedures involve, inter 
alia, state observers and a state feedback loop. 

Thus the knowledge of the plant order (or an upper bound on the order) is inherent in these 
approaches. 

It is more often a mle than an exception that plants that are minimum phase in continuous 
time are nonminimum phase in their discrete time approximate representations. If an 
upper bound on the plant order is unknown, reported studies of plants with nonminimum 
phase zeros are inapplicable. Moreover, when the plant is nonminimum phase, matching 
of the closed-loop transfer function with the transfer function of the reference model is 
impossible for persistently exciting inputs (except possibly in the trivial case when all 
nonminimum phase zeros of the unknown plant and the reference model have exactly 
matching locations and corresponding orders). This is because a nonminimum phase plant 
does not have a stable inverse. Consequently, adaptive tuning of systems which are allowed 
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to be nonminimum phase must be over some constrained class of inputs if asymptotic exact 
model following is demanded. 

There is a good practical reason (Benes 1965) to restrict the inputs to the Marcinkiewicz 
Space, Mi (Benes 1965), the space of bounded functions of bounded power. 

Mi = jjc(-) : [0, 00 ) t—» 91: jH |x(f)| 2 df <00 Vie 9t + , 

1 rT ) 

limsup — I |x(f)| 2 df < 00 .* 

T-+00 T Jo 

Li C Mi . In particular, the step and the sinusoids belong to Mi- It is significant here 
to observe that in the CGT approach the reference trajectories which the plant output 
has to follow are restricted to the class of outputs of a free-running LTI dynamical sys¬ 
tem. 

The main features of this presentation constitute relaxing the three usual requirements 
as follows. 

(a) No a priori information is needed about the parameters or the order or relative degree 
of the plant; 

(b) the plant is allowed to be nonminimum phase; 

(c) the input is not necessarily persistently exciting. 

However, the demand is that the plant be asymptotically stable. An additional mild restric¬ 
tion is that the plant and the reference model shall have no zero(s) at s — 0. 

This paper is organized as follows. The problem is formulated in § 2. Section 3 addresses 
amplitude matching, in part, and sign matching of the outputs. Two optimal schemes are 
proposed in § 4. Simulations comprise § 5. 


2. Problem formulation 

In this section, we formally state the problem, state some preliminary results and cast the 
problem in an optimization framework. Alongside the process of choosing a criterion to be 
minimized, we show how it captures all aspects of the problem. Our approach to solving 
the problem will also be indicated. 

2.1 The statement of the problem 

The MRAC problem addressed here involves finite-dimensional, LTI, SISO systems. The 
given plant G p (s ) is such that 

• (Al) its order or even an upper bound on its order is unknown; 

• (A2) it may be nonminimum phase; 


: 9t+ = (0, 00). 
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• (A3) it is asymptotically stable; 

® (A4) it is strictly proper and 

• (A5) it has no zero(s) at s = 0. 

The reference model, fully specified by a transfer function G m (s), satisfies (A2) - 
(A5). The nonminimum phase zero(s) of the plant and the model need not have matching 
location(s). 

It is often desired to match, or approximate as closely as possible, the step responses of 
the plant to be controlled and the reference model. We allow the reference model inputs to 
belong to a class of functions that are ‘step-like’; i.e., a class of functions that asymptotically 
approach a constant. We define this class of ‘step-like’ functions, to which the reference 
model inputs belong. ^ 

DEFINITION 1 

Q u j = {u : [0, oo) i—*• 9ft} such that u(t ) 

(1) is bounded and is either continuous or has finitely many discontinuities of the first 
kind; 

(2) is differentiable as t -»■ oo and the limit of the time derivative is zero; 

(3) has a non-zero limit (which is finite by virtue of 1 and 2) as t -> oo. • □ 

It is required to specify an on-line adaptive scheme which ensures convergence of the 
controller so that the plant output matches or approximates the model output. Also, bound¬ 
edness of the controller must be assured. 


2.2 Certain preparatory results 
PROPOSITION 1 

Let fi : [0, oo) i—> 91 be continuous or have utmost finitely many discontinuities of the 
first kind. 

(1) //lim^oo Mt) = c, then limr-Wl/ T) / 0 r /i(r)dr = c. 

(2) /o /i(r)dr is continuous for t e [0, oo). 


PROPOSITION 2 
Iffi. h ■ [0, oo) f 


- 9t satisfy / 0 °° / 1 2 (t)dr > 0 and / 0 °° / 2 (r)dr > 0, then 
[ (1/,) So /i2(t)dr ] / [ (1/,) £ / 2 (t)dt j exists (it may be infinite). 


Proof. Letting N(t) = /q fy(r)dz and D(t) = /q / 2 (r)dt, the required limit is lim, 
[N(t)/D(t)]. N(t) and D(t) are nonnegative and nondecreasing; hence lim^oo N(t) and 


A rt th 


t —> OO 


^The scope of this paper is restricted to inputs of this class. Results pertaining to inputs comprising signals that 
approach a sinusoid without and with a dc offset have also been obtained. In fact, the subscript / for ‘final value’ in 
Sl u j is meant to differentiate this class from and Q M(C , the classes of functions which respectively approach 
a ‘sinusoid’ without an offset and a sinusoid with an offset (‘combination’). 
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lim;-»oo D(t) exist. If both limits are finite or if only one of them is infinite, the proposition 
is proved. Else, the required limit has the form oo/oo. If as t -» oo, the order at which 
N(t) oo is higher than (the same as) [lower than] that at which D(t) —> oo, then the 
required limit is infinite (positive and finite) [zero]. D 


Lemma 1. Ifu e Q u j be the input and y the output of a system T is) satisfying (Al) - 
(A4), then 



= |MooT(0)| 


where M^ 


= lim u(t). 

t-^-00 


Proof. By (A3) and (A4), T (0) is finite and T (s) has a state model 
x (r) = Ax{t) + Bu(t), y(t) = Cx(t), 

with T(s) = C(sI - A)~ l B and lim f _i.oo Ce At B = 0. The zero-state response is 
y(t) = /q Ce A ^~ r ^Bu( t)dr. Since u e £2 u j, Ce A( -‘~ r ^ Bu( r) has utmost finitely many 
discontinuities of the first kind. Then by part 2 of proposition 1 y is continuous. Clearly, 
y is bounded and lim^oo y(t) = T (Q)M 00 . By applying part 1 of proposition 1 to y 2 , 

lim - f‘ y 2 (t) dr = T 2 (0)M 2 
t-*oo t Jo 00 

and the lemma follows. □ 


2.3 The performance index 

We use the foregoing results in constructing a performance index and provide an insight 
into its significance as we do so. 

Let g m and g p respectively denote the impulse responses of the reference model G m (s) 
and the unknown plant G p (s). Further, let u m e Q u ,f- The model output y m (t) = f^gmit- 
i)u m (r) dr. As even an upper bound on the order of G p (s) is not known, we are left with 
a situation wherein u m , y m and y p are the only functions available to generate the plant 
command u p . Thus, y p (t) = / 0 ' g p (t - r)« p (r)dr, where u p (t) = h(u m , y m , y p , t). 

As the aim is to specify h such that y p matches or approximates y m , the error [y m — y p ) 
deserves to be incorporated into the performance index. Since error-minimization is sought 
over the range, [0, oo), of time, / 0 °° {y m (t) — y p {x)\ 2 dr may serve as a performance 
index. However, this would tend to infinity with even a mild mismatch between y m (t) and 
y p (t) as t ->• oo. 

Consider 

roo 9 poo poo 

J o {ym(x) - y p (r)} dT = J o y^(T)d T + J q dr 

poo 

-2 J y m (r)y p ( r)dr. 

Even for a step input, / 0 °° y 2 (r)dv = oo. However, in the light of lemma 1, it is clear that 
when u m €Q„j, 
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lim - f y 2 ( r)dr = G^(0){ lim u m (t)] 2 e (0, oo), 

i-»oo t JQ t— 5-00 

since G m (s ) is asymptotically stable and without any zero at s — 0 and lim,- >0O u m (t) is 
non-zero and finite. If u p e Q Ui /, then by lemma 1 in like fashion, 


1 f‘ o 

lim - / y l A r)dr 
t-+ oo t J o p 


= G„(0){ lim u p (t)} 2 e (0, oo). 

F t -*00 r 


By proposition 1 it can be shown that 


1 

lim - / y m (Tt)yp(r)dz 
t-i-oo t Jo 


G m (0)G p (0){ lim u m {l)}{ lim u p (t)}, 

i—>oo oo 


which also is finite and non-zero. Hence the performance index would be more encom¬ 
passing if it comprised 


1 ft 9 

lim- / {y m (r) -y p (t)}"dr. 
f-5-00 t Jo 

This is the mean-square disparity between y rn and y p . Evidently, being a function of 
u m , it will, in general, be different for different model inputs of G u j. To facilitate cost 
comparison for different inputs, the candidate performance index can be refined to 


T A 
J f = 


lim 

00 


7 Jo {ym(r) - / [,55,7 jC yiMiz } 


Lemma 1 shows that the denominator of Jf is positive and finite. We may therefore specify 
Jf as a well-formed criterion for minimization. Nonetheless, it has the limitation that all 
inputs belonging to f are not concurrently considered. With a view to minimize the 
worst error as in W°°-optimal control the performance index is chosen as: 


j A 

Jf = sup 


d(u m ) : d(u m ) — 


lim/^oofl/Q /q |y m (r) - y p (r)} 2 dr 

fon t -*-oo(l/0/o ym( v ) dx 


(1) 


The goal is to generate u p that minimizes Jf. Observe that Jf is bounded below by zero. 
2.4 The approach to solution 


u m , y m and y p are the only signals available to specify the plant command, u p , as in 
figure 1. The adaptive controller in figure 1 represents a nonlinear time-varying gain h. 
For u m 6 Tlu.f, h has to settle at a value whose magnitude and sign assure matching of 
the magnitude and sign of y m by those of y p . Boundedness of h is a sine qua non. 

3. Suboptimal schemes 

We shall gradually build the optimal control law starting from two suboptimal schemes in 
this section. This is because the suboptimal schemes provide an insight into the various 
aspects of the problem arising out of 
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Figure 1. A schematic of the setup. 

• relaxing the assumptions on the plant, viz., information about its order and its bein 
minimum or nonminimum phase; and 

• the reference model inputs belonging to £2 u j. 

The following observation is pertinent before we set out to present the suboptimal law 

Observation 1. It was emphasized in the previous section that u m , y m and y p are the on! 
signals in terms of which the plant command, u p , has to be specified. Moreover, Jf ( 
(1) is independent of any parameter of the plant or model or otherwise. The problem 
therefore not in the class of parametric optimization problems. Consequently, we canm 
resort to methods such as gradient descent in some parametric space. This will lead us i 
laws and proofs that are interesting because of their unconventional approach, as will t 
demonstrated. 

Lemma 2 (below) throws light on the aspect of on-line matching, in magnitude, of tl 
model output by the plant output. On-line matching of signs of the model and the plai 
outputs is ensured by lemma 3. Lemma 4 is a variant of lemma 3 and it achieves the san 
purpose. 

In what follows in this and the later sections, unless otherwise stated, it is understoc 
that we carry forward the notations and specifications of § 2. Further, zero initial conditioi 
are assumed. This assumption will be relaxed in corollary 2 towards the end of § 4. 

We are interested in the matching of y m by y p for the ‘step-like’ inputs. It will 1 
instructive to explore the different candidates for the map h of figure 1 by considering < 
example. 

Example 1. Let G m (0) = 4 and G p (0) = — 1. Then with u m (t) a unit step, lim^oo ym ( 
= 4 and lim r ^oo y p (t) = —1. In the control law of the form u p {t) = h(t)u m {t), 
an h satisfying lim f _*.oo/i(t) = -4 were chosen, we can achieve the desired matchii 

asymptotically. Toward this end, as a candidate for h, suppose we set h i (t) = y m (t)/ y p 0 
A little consideration will show that this may lead to unboundedness of the controller thou; 
y m is bounded. This is because it is attended by the problem of zero-crossings of y p di 
to: 

(i) G p (s) being allowed to be nonminimum phase with unknown zero locations; and 

(ii) zero-crossings of u m (t) for any finite t are permitted by £2 M ,/. 
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An h 2 (t) of the form /q y m //q y p also suffers from the same drawback. To circumvent such 
problems, we may as well look at a map h with its denominator positive and nondecreasing, 
say. Then, 



In computing /13 from the above, the loss of sign information is evident. Nonetheless, 
putting off, for the present, the issue of sign-matching, we proceed by taking the cue from 
this last form, h^, and investigate amplitude matching. 

Lemma 2. In the setup of figure 1, let u p {-) = a(-)u m (-) where u m e Q u ,f and 
l; 0 < t < T\ < 00 , 

a (t) A . 3ti e (0, T\) 3 y m (ti)y p {t\) ^ 0 ; ( 2 ) 

i \{[jIoyl^ dr ]/[7 t>T i- 

Then 

(1) a is bounded; 

(2) lim^ooafr) = |{|G m (0)/G„(0)|2}|. 

Note. Intermediate steps in some proofs are indicated by subtitles. 

A rigorous proof of this lemma has been worked out (Shankar 1993). We sketch here 
an outline of that proof with some details here and there. 

Proof Denote lim^oo u m {t) = M m . Then lim^oo y m (t) = G m (0)M m , and as G m (s) 
has no zero at 5 = 0 , by part 1 of proposition 1 , 

ton l - f‘ yl(r)dr = G 2 m (0)M 2 m e (0, 00 ). (3) 

Claim. a(t), t € [0, 00 ), is finite. 

The boundedness of y m and the definition of T\ in (2) is pivotal in proving this claim. T\ is 
a time until after the plant and model outputs are excited. It is necessitated by the definition 
of Q u j which allows u m to be zero except at removable discontinuities for some finite 
time. It serves an additional nontrivial purpose; that will find mention in the section on 
simulations. 

Claim, ton(^oo a(t) exists. 

Follows from suitably applying proposition 2 to (2). 

Our next aim will be to show that a settles to a positive, finite value. 

Claim, lim^co a(r) ^ 00 . 

Suppose lim^oo a(t) = 00 . Then | lim^oo u pf )\ = 00 . Two cases arise here. 
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Case 1. lim fH>00 \u p (t)\ = oo such that lim f _*oo \yp(t)\ < oo. 

As G p (s) may be nonminimum phase, it is necessary to consider a situation \ 
u p may go unbounded such that y p remains bounded. This is possible iff each f; 
the form ( s — cr ; ), 07 > 0, in the denominator of U p (s) is cancelled by a corresp 
nonminimum phase zero, (a — 07), of G p (s). Consequently, a(t) has to grow expon 
resulting in lim t-*oo(a 2 (t)/t) = 00 and thence, 

lim a 2 (t)— f y 2 (t)dz = oo. 
t-*- °° t Jo p ' 

The LHS of the above is also, by virtue of ( 2 ) and ( 3 ), 

7 f Q >i( r ) dT = G ™(°) M m < 

Contradiction! Hence 

{ lim a(t) < 00} OR {lim a(t) = 00 and lim |y D (f)| = 00). 

t-+OQ t~+OG t—>OC F 

The first option proves the claim. The second leads to the following case. 


Case 2. lim f ^,oo \u p (t)\ = 00 such that lim^oo 1 ^( 0 ! = 00. 

Here, it can be shown by invoking ( 2 ) and ( 3 ), lim l -_* 0O Q'(r) < 00, thus violal 
supposition that lim^oo a(t) = 00. 

Hence the claim. 

Part 1 of the lemma, namely, the boundedness of a now directly follows. Clearb 
fore, u p and hence y p are bounded too. 


Claim, lim^oo a(t) ^ 0 . 

Suppose not. Then lim t _>.oo u p (t) = 0 and, by lemma 1 , limj_>oo (1/0/o 
Using ( 3 ) in ( 2 ), 


lim o: 2 (f) = lim 

t—> CO t —> OO 



ym( T ) dT 


a contradiction. The claim is established. 



yl(r)dr 


= 00, 


Remark 1. In fact, this boot-strapping property of a(t) for t > T\, as demonstrate 
proof till now, is one of the motivations behind the definition of a as in (2). 

Remark 2. As a and u m are bounded, an examination of /oy^(r)dr and Jq j 
reveals, in view of part 2 of proposition 1, that a(t) is continuous in t, except pos 
t = Ti. 


Claim. limj_>oo(da(0/df) = 0. 

In view of the remark 2 ,u p — au m is bounded and has utmost finitely many disconi 
of the first kind. Part 2 of proposition 1 then establishes that y m and y p are con 
Thus a(t) is differentiable w.r.t. t, t > T\. Then by virtue of lim^oo oc(t) € ( 0 , 
claim holds. 
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Evaluation o/lim r _ >00 Q'(r): Let ap = Hindoo or(0- Then lim^oo u p (t) = ctfM m . 
From the properties of a proved hitherto, it can be shown that u p e Hence by 

lemma 1 applied to G p (s ) excited by u p , 


lim 

r~>oo 


v 

t Jo 


yl(r)dz 


— Gp(0) aj M m > 0. 


Using this and (3) in (2) and simplifying, 



1 

G m (0) 


«/ = 

1 

G p ( 0) 

i 


□ 


With a small technical modification in lemma 2, Corollary 1 follows directly. 
COROLLARY 1 

The results of lemma 2 hold with a{-) redefined as 


cc(t) = 


f 1; 0 < t < T\ + e < oo, 7i, e > 0, 

3ti e (0, T\) 3 y m {t\)y p {t\) 0; 

[ftF ir yli^dr] j [i /o ,_£ y 2 p (r)dx] [ 


t > T\ + s. 


□ 


Remark 3. Motivated by the discussion in example 1, in corollary 1 a law was proposed 
which, though blind to sgn{G m (0)} vis-a-vis sgn{G p (0)}, was meant to ensure on-line 
magnitude-matching. But such gain-matching is actually not achieved by corollary 1 as 

• ctf — 2, and not 4 as is required for gain-matching, and 

• Hindoo lym(OI = lim;->oo |yp(0| requires lim^oo a(t) = 1. This is obvious from (2) 
seen in the light of lemma 1. 

Clearly, unless |G m (0)| = |G p (0)|, corollary 1 does not ensure gain-matching in magni¬ 
tude. Moreover, if |G m (0)| = |Gp(0)|, such gain-matching is itself superfluous. In other 
words, if in corollary 1, lim^oo odt) equals unity, a itself can be dispensed with from the 
control law! Nonetheless, the control u p of corollary 1 is a step towards dc gain-matching. 
This is because the insight gained by analyzing the shortcomings of corollary 1 will be 
exploited in the next section in the design of an optimal controller. In addition, the proof 
of lemma 2, which is essentially the same as that of corollary 1, simplifies the proof of 
theorem 1 (see § 4) as well. 

Remark 4. More importantly, the control of corollary 1 is insensitive to sgn{G m (0)} vis-a- 
vis sgn{G p (0) }. In handling plants and reference models which are allowed to be nonmin¬ 
imum phase this further inadequacy of the control law of corollary 1 is obvious. Lemma 3 
is meant to show how sign-matching can be achieved by introducing an additional factor 
f >: [0, oo) i—> {1,-1} into the law given by corollary 1. 
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Lemma 3. In the setup of figure 1, ifu p (-) = a{-)f{-)u m (•) where a is givenby corollat 
and 

1; 0 <t<Tp, Tp e (7i, oo); 

-1; 2% < t < 2 i+l Tp, u p (2 i Tp).y p (2 i Tp)G m (0) < 0, 

0(0 =• i = 0 , 1 , 2 ,...; 

1; 2 % < t < 2 i+l Tp, u p (2‘Tp).y p (2* Tp)G m (0) > 0, 
i = 0,1,2 ,...; 

then 

(1) Up and y p are bounded ; 

(2) lim f _>oo \u p (t)\ = {|G m (0)/G p (0 )|} 2 |M m |, M m = lim^oo u m (t)\ 

(3) lim,_>oobp(0| = {|G m (0)G p (0)|}2 . \M m \] 

(4) the performance index 

J f = [G 2 m (0) + |G m (0)G p (0)| - 2|{|G m (0)|i.|G p (0)|5}|]/G2 (0). 

A comprehensive proof of this lemma has been worked out (Shankar 1993). Only 
abridged version thereof will be presented here to give a flavour of the nature of the iss 
involved and the methods employed to address them. 

Proof. As seen in the proof of lemma 2 (vide (3)), 

lim - f y 2 m { r)dr = G 2 m (f))Ml e (0, oo). 

t~ro o f JO 

Claim, a, u p and y p are bounded and lim ? _»oo a(t) exists. 

This follows by suitably adapting the proof of lemma 2 and is regardless of the existei 
of lim r -^oo /3(f). It is important in this context to observe that fi = ±1 and is allowec 
change only at discrete instants of time separated by exponentially growing intervals. T 
is part 1 of the lemma. 

The introduction of ji = ± 1 into the control law here does not require more than strain 
forward modifications in the proof of lemma 2 in establishing that lim^oo a (0 ^ 0 a 

lim da(f)/df = 0 . 

t-+ OO 

Claim, lim^oo 0(f) exists and equals sgn{G ra (0)G p (0)}. 

Herein lies the focus of this lemma. By definition /3(f) is constant for t € {2 l Tp, 2 i+1 7 
i =0, 1,2.... The interval (2 i+l — 2 l )Tp increases without bound as i oo.Let tipi 
t u p be such that for some i, 2‘ Tp < tip < t u p <2 1+1 Tp. Suppose 

0(f) — + 1 , t € [tip, t u p ]. 
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Then for t e [tip, t u p ], 

u p (t) = a(t)u m (t). (7) 

As u m e Q, u j is differentiable as t —*► oo, 3tp\ e 9t + (?) exists V? > ?^i. Choose 

tip >tp\. (8) 

Then u m (?) exists V? > tip. So from (7), for ? e (?/£, ? u ^], the interval in which ft is not 
allowed to change, 

ii p (?) =a (z?)w m (r) + a(?) h m (?). (9) 

Assumingfor the moment that (7) holds for all t > tip, (9) yields, in view of (5), lim^oo u p 
(?) = 0 , 

i.e., foranye^i > 0, 3 tp 2 e 91+ 3 | it p (?)| < ep\ V? > tpi. (10) 

Let 


tip > tp2- ( 11 ) 

Note that even though (7) does not necessarily hold for all? > tip , but only for ? € [tip, t u p ], 
yet in view of ( 11 ), 


I Up (01 < epu tip < t < t u p. ( 12 ) 

This is because of the causal nature of the system. Equation (12) was derived primarily 
to illustrate how system causality can be used to advantage in proving the existence of 
lim^oo £(?)• 

Now back to the assumption that (7) holds for all t > tip, 


y P (t ) = g p (t - t)zz p (r)dr = jj C p e A P (t x) B p u p { r)dr 

+ f C p e A P {t - T) BpU p { T)dr. (13) 

Jtm 


(' C p , A p , B p ) is a minimal realization of G p (s). As lim f _*.oo e Apt = 0, 

r*ifi 


^hm^ J P C p e A P T) B p Up( t)dt = jj [e Ap{t r) j B p u p (r)dr = 0 . 

i.e., for any £^2 > 0, 3 tpj e 9 1^ C p e A P ( ' t ~ x ' > S / >M p (r)dr| < ep 2 


2 Let 


Then 


tup — tpp 

r f w 


v? > tpn,. 

(14) 


JJ C p e A P^- x) B pU p{ r)dr 


< £02- 


(15) 
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The second term in the RHS of (13) will now be taken up. 


f C p e A P {t x) BpU p { r)dr= u p (x) f C p e A P (t ^ B p dr 

Jtip JtiB 


It 


tlf! 


r=tip 


-/'[/ 

t) f Cp 


kp 


C p e A P ( ‘- x) B p dr 


u p (r)di 


e A P (t - x) B p dr 


+ f C p e A P {t - x) A- x B p u p (r)dr 
J kp 

- f Cpe^-^A-'Bp up (r)dr. ( 

hip 


■% 


Consider the first term in the RHS of (16). 

Up(t ) f Cpe A P (t - f) BpdT=Up(t)\c p e A P (t - f) (-A- l )Bp\ > 

Jtip L J *i 

= Up(t ) J-CpAp 1 #/,] 

+u p (t)Cpe A P ( - t -“f> ) A- l Bp. ( 

As lim/^oo C p e A p( t ~ t ‘^A~ l B p = 0 and as u p is bounded, 

for any epi > 0, 3^4 e!li + 9 u p (t)C p e A P (t ~ tl i 3 ' > A~ l B p < eps Vt > tp 4 
Set 

t U p > tp 4 - 

Then, 


( 


u p (tup)C p e Ap ^~ t ‘^ A~ l B p 


< £p3- 


( 


Hence using G p ( 0) = —C p A p 1 B p , it follows from (17) that 


\u p (t u p)Gp(0) £^ 3 ] < u p (t u p) [ C p e Apl ' t "P Bpdx 

Jtm 


tip 


[ u p(tup)G p (0) +£^ 3 ] • 


( 


Likewise the second and the final terms in the RHS of (16) can be analysed to yi 
respectively the following: 


for any ep 4 > 0, 


Cpe A P^- x) A- l B p ii p (r)dt 


< £,84 


( 
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and 


for any £05 > 0 , 


f C p e A p( tl ‘P ^A p x B p Up { r)dr 

'tip ‘‘ 


£ 05 - 


( 22 ) 


Using (20), (21) and (22) in (16) and rearranging. 


Up(t u p)G p ( 0 ) - [£03 + £04 + £05] < [ C p e A P (tu P T) B p u p (t)dr 

Jtip 


< Up(t u p)Gp( 0 ) + [£03 + £04 + £05]- 


(23) 


Substituting (23) and (15) into (13) and simplifying, 

Up(t u p)Gp(Q) — [£02 + £03 + £04 + £05] < yp(t u p) < U p (t u p)G p ( 0) 

+[£02 + £03 + £04 + £05]- (24) 

G p (0)^0by(A5). 

Inequality (24) was derived with the initial assumption (6), i.e., f(t) = 1, re [tip,t u p]. 
A scrutiny of the derivation is, however, sufficient to reveal that even iffi(t) = — 1, re 
[tip, t u p], (24) would hold. 

Before proceeding further with (24), it shall be shown that when t u p exceeds a threshold 
value, u p (t u p ) must necessarily be non-zero regardless of whether /S(t) — 1 or —1. 

Since limbec u m (t ) = M m ^ 0, 

for any epp e (0, \M m \), 

3*05 € 3l + 9 M m - £06 < u m (t) <M m + £06 Vr > tp 5 . 

Let 


*m 0 > * 05 - (25) 

Then as a > 0, u p (t u p) = f(t u p)a(t u p)u m (t u p) ^ 0. Consequently £ 02 , £ 03 , £04 and 
£ 05 , which are arbitrarily small, can be so chosen that 

0 < [£02 + £03 + £04 + £ 05 J < \u P (tup)\-\G p (0)\. (26) 

When (24) is seen in the light of (26), it is apparent that y p (t u p) ^ 0, that 

sgn{y p (r H 0 )} = sgn[M p (r„ 0 )G p (O)}, 

and that 


sgn{« p (t u p)y p (t u p )} = sgn{u p (t u p)}sgn(u p (t u p)G p (0)} = sgn{G p (0)}. 

(27) 


Constraints ( 8 ) and ( 11 ) have to be simultaneously satisfied by tip. Let 
tip > max[r 0 i, tpf\. 


( 28 ) 
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Constraints (14), (18) and (25) have to be simultaneously satisfied by t Uj g which, in additi< 
has to exceed tip. Choose 

t u p > max{fys , tp 3 , tpn, tps). (1 

From (28) and (29), tip and t u p can be set such that for some i, say j, 
tip > 2 j Tp and tip < t u p = 2 j+1 Tp. 

Equation (27) now implies sgn{« p (2 ;+ 1 T ; g)yp(2- ,+ 1 7’ ; g)} = sgn(G p (0)} and thence, 

sgn{G m (0) Mp (2^ + 1 ^)y p (2^ + 1 7»} = sgn{G m (0)G p (0)}. (■ 

It is emphasized here that (30) is true independent of the assumption that (7) holds j 
all t > t^. Also, it has been seen that f(t) = —1 for t e [tip, t u p] leads to the same rest 
When (30) is compared with the definition of f it is clear that regardless of what f 
fovt € (VTp.2 j+l Tp] was, 

m = sgn{G m (0)G p (0)}, t e (2^' + 1 7>, 2' + 2 7>]. (; 

Hence it is established that 

lim m = sgn{G m (0)G p (0)}. (: 


A 


Evaluation of lim t -+oo \u p (t)\and lim^oo |y P (OI- Let a/ = lim f _>oo ait). Wecanarr 
at 


Otf 



G m (0) 

in 

2 


G p i0) 



Then 


lim |«n(f)l = 

t-> 00 1 y 1 


G m ( 0) 


Gp( 0) 




Hence lim f _*oo \y P (t)\ = |{|G m (0)G p (0)|2}| .\M n 
This completes the proof of parts 2 and 3. 


The role of f and evaluation of J/. Applying lemma 1 to the plant excited by u p , we j 


lim - 

t-VOO t 


f 


y p (t)dr = |G m (0)G p (0)|M 2 


lim u p (t) = lim a(t)f(t)u m (t) = 

t-+ OO t-+ 00 


Grni 0) 2 

Gpi 0) 


M m { lim /3(f)}, 

t-+o o 


Now, 
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Table 1. Role of fi. 



Condition 




Consequence 


sgn(Aif m ) sgn{G m (0)) sgn{G p (0)} 

lim r -+oo 

0 (r) sgn{lim,_>oo 

y m (t)} sgn{lim f _»oo 

y p (t)} sgn{lim r -,oo y p (t)} 

+ 1 

+1 

+ 1 

+1 

+ 1 

+1 

+ 1 

-1 

+ 1 

+ 1 

+ 1 

-1 

-1 

-1 

+ 1 

-1 

+ 1 

-1 

-1 

-1 

+ 1 

-1 

-1 

+ 1 

-1 

+ 1 

+1 

-1 

+ 1 

+1 

-1 

-1 

+ 1 

+1 

-1 

-1 

+1 

-1 

-1 

-1 

-1 

+ 1 

+ 1 

-1 

-1 

+1 

-1 

-1 

-1 

-1 

-1 

-1 

+1 

+ 1 

+ 1 

+ 1 


Note. Here y p (t) denotes the plant output if = 1. 


& 


and thence. 


lim y p (t) = G p (0){ lim u p (t)} = G p ( 0) 

r—>oo r t-±oo 


f G m ( 0) 

1 

1 

I G p (0) 



M m {lim £(f)}. 

r->oo 


(34) 


We have seen that 


lim — G m (0)M m . (35) 

f -> 00 

Table 1 brings out the effectiveness of /3 in ensuring that lim, _> oo (0 has the same sign as 
lim,oo In developing the table, (32), (34) and (35) have been used. All permissible 

combinations of sgn{M m }, sgn{G m (0)} and sgn { G p (0)} are considered. 

It can be seen that for all combinations of M m , G m { 0) and G p (0), the model and the 
plant outputs are of the same sign as t -> oo. On the other hand, if ft = 1, i.e., if the control 
law of corollary 1 were employed, the model and plant outputs would have different signs, 
as t -> oo, whenever G m ( 0) and G p { 0) have different signs. 

It is now straightforward to show that 

lim - [ y m (x)y p (x)dx = G m (0)G p (0) 

f ->-00 t JO 

(36) 

Using (4), (33) and (36) and simplifying using table 1, we get 
lim^/' {y m (j) - y p (x)} 2 dx 

t-*- oo t Jo 

= G 2 m (0)Ml + |G m (0)Gp(0)| — 2|{|G m (0)|2|G p (0)|2}|M^. (37) 

Now by (37) and (4), 

d(u m ) = [G 2 m (0) + |G m (0)G p (0)| - 2|{|G m (0)|i|Gp(0)|2}|]/G2 (0). 



G m (0) 

1 

2 


G p { 0) 



Mi 


{lim m)- 

t-+oo 


( 38 ) 


We make here an interesting observation, namely, d(u m ) is independent of M m . Hence th 
RHS of (38) remains the same for all u m e Q u j- Therefore 

Jf = [G%( 0) + |G m (0)G p (0)| - 2\{\G m {Q,)\l\G p mh]/G 2 m (0). □ 

We now provide an alternative sign-matching scheme in the law of lemma 4 to follov 
In view of lemma 3 and its proof, the motivation behind this scheme is relatively easy t 
grasp. We shall confine ourselves to making a few cursory comments on the similaritie 
and differences between the two sign-matching schemes. 


Lemma 4. Lemma 3 holds even if ft there is replaced by 


Pit) = 


1; 0 < t < 7j +e, e € 3t + ; 

-1; t > T\ +s, G m (0)[l/(t + s)]f^ e u p (r)y p ir)dt < 0; 
1; else. 


[ 


The underlying principle that is exploited in both sign-matching schemes is essentiall 
the same. In fact, table 1 is valid here also. The primary variation is that p here is adapte 
continuously instead of at pre-specified instants as before. As a result, the computation; 
effort involved here is relatively more. The time at which (3 settles to its final value indee 
depends on the choice of Tp. 

Lemma 3 demonstrates how the introduction of p in the control law ensures the desire 
sign-matching. Thus both minimum and non-minimum phase plants and reference model 
have been successfully taken care of. In this sense, the control law of lemma 3 is arefinemer 
of the law of corollary 1. It is, nonetheless, not optimal as judged by the performance inde> 
Both these aspects are brought out in example 2. 


Example 2. 


G m {s) '■ 


8(s+3) 


Gn(s): 


(s + 2)Cr+4)’ 
(J-9) 

(s+ !)(*.+10)’ 


M m(') 6 £2 Ut f', M m — 1. 


With the law of corollary 1, u p = au m , the y m {t) -> 3.0, a/ = 1.8257, y p (t) - 
— 1.6432 and Jf = 2.3954. This large value of Jf is obtained because as t oo, y m (t 
and y p (t) have opposite signs even though their magnitudes are close. Else, if the contrc 
law of lemma 3, u p = apu m , were employed then y p (t) —> 1.6432 due to P(t) an 
Jf = 0.2046. Thus lemma 3 is a refinement of corollary 1. 

Now, choose u p = —y m . Then Jf = 0.01. Thus the scheme of lemma 3 is sub-optima 

Obviously, the control law, u p = -y m , can perform miserably for some other G p (s] 
whereas u p = aPu m will continue to force 7/ to the value specified in lemma 3. Th 
former control was chosen having the details G m (s) and G p (s) in mind. As G p (s) i 
indeed unknown, Jf cannot be computed a priori. Nonetheless, lemma 3 specifies exactl 
how Jf depends on G m is ) and G p (s). 

With this, corollary 1 and lemma 3 (or lemma 4) set the stage for the optimal contrc 
schemes that follow. 
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4. On-line-adaptive-optimal schemes 

The schemes of § 3 do not ensure the desired asymptotic matching of the model output by 
the plant output. It was remarked ( vide remark 3) that lim^oo a(t) = 1 is necessary and, 
further, that such an a is dispensable from the control law in the context of corollary 1. 
This holds even in respect of the laws of lemmas 3 & 4. Naturally, the question that we 
seek to answer is the following: Which bounded function, say y, of a, simultaneously 
ensures, in the present framework, the convergence a -> 1 and, in turn, itself converges to 
|G m (0)/Gp(0)|? If we succeed in constructing such a function, we would have solved the 
problem posed in § 2. This is precisely the motivation for the gain-matching schemes. Sign¬ 
matching as ensured by lemmas 3 or 4 must indeed continue to hold. A scheme Af which 
proposes a y(a ) which is updated only at specified instants of time, and which is shown 
to be e-optimal, has been reported (Shankar 1993). Algorithm A/ 0 to be proposed here is 
in pursuit of optimality. Unlike in Af, adaptation in A/ 0 is continuous. A second optimal 
scheme, A* 0 , that is capable of ‘improving’, in some sense, on Af 0 will then be presented. 
Finally, corollary 2, below, shows how the presence of non-zero initial conditions of the 
plant is not a deterrent to the results presented in this paper. 


Algorithm Af 0 -' The system is setup as in figure 1. 

« m (-) e £2 k,/, «(•), as given in lemma 3, and 

fi (•), as given in lemmas 3 or 4. 

1; 0 < t < T\ + e, e e 9t + ; 


y(t) 


t > T\ + e, Jl + h fo ~ e { o(t)+T } — o}, r) € 91 + ; 

0 ; else, 


The control law: u p (-) ^ □ 

Theorem 1. Let the control law be given by algorithm A / 0 • Then 

(1) a, y, Up and y p are bounded; 

( 2 ) lim^oo«(0 = 1; 

(3) lim f _»oo/K0 = sgn{G m (0)Gp(0)}; 

(4) lim^ooyW = \G m (0)/G p (0)|; 

(5) lim f .+oo u p (t) = sgn{G m (0)Gp(0)} |G m (0)/G p (0)| M m ; 

(6) limi-xx) yp(t) = limf_>oo ym(f) — G m {0)M m ; and 

(7) the performance index Jf = 0. 
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Outline of proof. It may be observed that by definition, y > 0. The existence of a f ~ 
lim r -+oo cc(t) is ensured by proposition 2 as before. 

{“/ < 1 1 =» y (t) = 0} =» {ton y p (t) = 0} => {a f = oo}, 

a contradiction. It can also be shown that 

{«/ > 1 1 =► Y(t) = oo} =» {ton \y p (t)\ = oo} =* {a f = 0}, 

a contradiction again. Thus ay = 1. 

Note that tor lim^oo y (r ) to exist, it is sufficient that the integrand [(a - 1) /{a+ 1)] 

0. This in turn can happen iff a/= 1. So lim^oo y(r) exists. 

Hereafter, after ensuring the existence lim^oo fi(t) as before, the various limits can be 
directly evaluated and the rest of the theorem proved. D 

The problem stated in § 2 is hereby solved. 

We are now in a position to propose an alternative optimal scheme, AC. The cue for 
this second scheme is contained in the fact that by theorem 1, a/ = 1. ° 

Algorithm A*-,,. This algorithm is identical to algorithm A fo except that the control law 

is given by 

Theorem 2. The results of theorem 1 hold with the control law specified by algorithm 

A} fl . □ 

The proof readily follows from the proof of theorem 1 taken with the results that a 
is bounded and it smoothly settles at unity. After all, in that event, the control of Aj 0 

approaches the control of A ./„. 

A* (l is optimal like A/,, in the sense of Jf. Also, in both the cases, to start with, 
u(t) — 1, t e (0. T\ 1, and finally, a/ = 1. It may therefore appear on the face of it that 
A* f o will contribute a mere unnecessary transient excursion. That, however, is not the case. 
It does possess some additional interesting properties. We will reserve our comments on 
this issue until after presenting the simulations of the two laws. 

Non-zero initial conditions: In all the schemes discussed thus far, zero initial conditions 
were assumed (vide §3); i.e., x p { 0+) = 0. As the plant considered here is asymptotically 
stable, any unforced dynamics (due to non-zero initial conditions) has to decay smoothly 
and at an exponential rate. Such dynamics, if any, may be looked upon as the effect of 
some specific, yet unknown, finite-time perturbation in the control. Seen in this light, the 
results presented in this paper are not compromised. This aspect is formally brought out 
in the following corollary. 

COROLLARY 2 

Corollary /, lemmas J and 4 , and theorems 1 and 2 hold even ifot(-) is consistently replaced 

hv (/{•) where 
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<5(0 


pit)-, 0 < t < T\ + e, 7j, e e 91 + , 

€ (0, T\) b y m (h)yp(t\) f 0}; 

a(r); else ; 


and p : [0, 7j] i—-> 31 is bounded and continuous or has utmost finitely many discontinu¬ 
ities of the first kind. □ 


It may be noted that the condition 

{3fi € (0, 7j) 9 y m (ti)y P (ti) + 0} 

sets only a lower bound for 7j . 7j is simply required to be greater than the time at which 
u m begins to activate the outputs of the model and the plant. 

Remark 5. Before closing this section a few words on the formulation of the various 
schemes proposed in this paper will be in order. With an eye on tracking plants that exhibit 
either continuous/frequent, small drifts or relatively large, infrequent jumps, the schemes 
are designed to be continually sensitive to any deviation from the desired performance and 
to modify the control suitably. This puts the schemes proposed as also the proofs given 
here in the proper perspective. 


5. Simulations 

Reporting simulations here serves a dyad of purposes. First, to facilitate comparison be¬ 
tween the control laws presented in this paper so as to demonstrate the progressive refine¬ 
ment as remarked in the previous sections. Second, to provide an insight into the roles 
played by 7j and rj which are to be chosen by the user, i.e., to highlight how, using the 
same law, better transient responses may be obtained by suitable choice of 7j and rj. In 
all simulations, therefore, we use the same reference model, G m (s), and one of the two 
plants, Gp\(s) and G p 2 (s). Thus, 

reference model: G m (s) = \/{s + 2), 

plant 1: G p \{s) = (s + 2)/[(s + l)(s + 3)(s + 4)] and 
plant 2: G pZ {s) = (s - 2 )/[(s 4- 1 )(j +3 )(s + 4)]. 

Plant 1 is mmimum phase like the model, whereas plant 2 is nonminimum phase. They 
are both chosen to have the same poles to bring to focus the effect of the change in the 
location of the zero. It may be observed that the model has no pole in common with either 
of the plants. We have 

G m (0)/G pl (0) = -G m (0)/G p2 i0) = 3, 

which is the desired magnitude of the controller gain. 

In all cases, the reference model input is chosen as u m (t) = 5 u(t), where u(t) is the 
standard unit step, s = 0.1 throughout. 

The plant response in the absence of control is designated y p and is shown along with 
y m and y p . 


Controller Gains —> Responses — 
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Figure 4. (a) Reference model and plant responses, and (b) controller performance for case 3. 

Case 1. Plant 2; the control u p = au m of lemma 2; T\ = 2.0; r] = 0.2; figure 2. a -> \/3 
and sign-mismatch exists as expected. 

Case 2. Plant 2; the control u p = afiu m of lemma 4; T\ = 2.0; rj = 0.2; figure 3. a -* \/3 
as before but sign-matching is achieved. The prolonged transient in y p is due to the delay 
in P settling at —1. A better transient response is often desirable. 

Case 3 below is meant to show how a different choice of T\ in the otherwise same 
situation can provide a substantially improved y p . 

Note. The difference in the spreads of the time axes between one figure and another has 
to be recognized before interpreting the plots. 

Case 3. Plant 2; the control, u p = afu m of lemma 4; T\ = 1.0; rj = 0.2; figure 4. 
While the asymptotic behaviour is as in case 2, the following improvements are 
noteworthy. 

(i) Much better transient behaviour of y p and faster convergence. 

(ii) Quicker settling of ft to — 1. 

(iii) a not only settles faster, but its peak value is reduced by a factor of 2. 

Case 4. Plant 1; the control u p = fyu m of algorithm Af 0 with f of lemma 4; T\ — 2.0; 
T) = 0.2; figure 5. a 1 unlike in earlier cases, fi remains at+1. The convergence of y to 
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3, the desired value, is the highlight in this case. Amplitude-matching is achieved, though 
not yet so upto t = 200. 

The rate of convergence of y, and thence of y p , however, leaves much to be desired. It 
was this desire that motivated, as stated more than once in § 4, the second algorithm AJ 0 
which, though optimal like Af 0 in the sense of Jj, is an ‘improvement’ over the latter. 
Case 5 is meant to demonstrate this fact. 

Case 5. Plant 1; the control u p = afyu m of algorithm Aj- a with f of lemma 4; T\ = 2.0; 
>7 = 1.0; figure 6. Observe that the plots are for t e [0, 50] here. The initial excursion 
of a from unity has evidently contributed to the improvement in the overall gain ay and 
hence in y p , in addition to reshaping y itself for the better. Of further interest is the faster 
convergence of a itself in this case as compared to case 4. Indeed, our claim that algorithm 
AJ 0 compares favourably with algorithm A/ 0 is amply justified. 

Even here, there is scope for fine-tuning the controller by suitably choosing rj. Consider 
case 6. 

Case 6. Plant 1; the control u p = afyu m of algorithm Af 0 with ft of lemma 4; 7j = 2.0; 
rj = 2.0; figure 1. a, y, ay and y p - each of these is seen to be more desirable here in 
comparison to case 5. 

Case 7. Plant 2; the control u p = afy u m of algorithm Aj- 0 with ft of lemma 4; 7j =2.0; 
>? = 0.2; figure 8. comes into play here. The trajectories of the gains and of y p before 
and after convergence of f to —1 are markedly different. The efficacy of the algorithm 
AJ 0 is borne out clearly. 

Finally, we seek to answer the following question: With any of the schemes proposed in 
this paper, for plant 2 (which is, of course, unknown), can we improve upon the response of 
case 7? The last case to follow provides an answer. Only one change is made from case 7, 
i.e., T\ is set to 0.5. 

Case 8. Plant 2; the control u p = afyu m of algorithm AJ 0 with f> of lemma 4; T\ = 0.5; 
>7 = 0.2; figure 9. The answer is evident, indeed. While the peak value of the overall gain 
ay may have gone up compared to that in case 7, it has, paradoxically, contributed to a 
decrease in the peak overshoot of y p . 

Remark 6. A systematic analysis of the simulations reveals an interesting phenomenon. 
The successive refinements in the laws is due to the improvements in the individual factors 
involved in the control. When even one of a, f, y or y p moves towards its desirable 
value, it induces all the others toward their respective desirable values. Conversely, any 
tendency to undesirable behaviour on the part of any one of the above variables reflects in 
all the rest so that such a tendency is effectively countered. This combination of inverse 
relationships between some signals and boot-strapping within the same signal is seen to 
yield a significant enhancement in the overall performance. 















Controller Gains —> Controller Gains —■-> Responses 
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(b) 



Figure 9. (a) Reference model and plant responses, an 
mance for case 8. Note the change in the time axis betwe 
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6. Conclusion 

In the context of stable MRAC, on relaxing the requirement that the order or an upper b« 
on the order of the plant be known, and by allowing the plant and the reference m 
to have nonminimum phase zeros, restriction on the permissible model inputs arises 
natural consequence. Persistency of excitation is done away with. For inputs fromacla 
‘step-like’ functions, the problem is readily formulated in an optimal control framev 
Using sub-optimal schemes which are proposed in the beginning as building blocks, 
on-line adaptive optimal schemes are designed. The schemes ensure boundedness o 
controller and its asymptotic stability. 

This paper improves upon the e-optimal schemes reported in an earlier work (Sha 
1993) for a class of adaptive control problems only recently addressed. The striking fea 
ot the algorithms proposed here are simplicity and total time-domain implementatioi 

The foregoing schemes are attractive in the context of enhanced performance in; 
tion control problems where (i) a reduced-order model is often inadequate, and (ii 
parameters of the transfer function vary with the operating conditions. MRAC has 
tried out in connection with aircraft control, in power systems and for automatic stei 
of ships (van Amerongen & Udnik Ten Cate 1975; Arie et al 1986). This work is a 
step towards applications such as autopilots for ships and positioning systems of mi 
launchers. 

For inputs belonging to a class of ‘sinusoid-like’ functions without and withdc o 
adaptive laws have been designed and similar optimal tuning achieved. 


This paper is dedicated to Dr R M Umesh. 

The first author wishes to thank Prof. M A L Thathachar for many useful discuss 
Dr V Jayashankar, G Ramasubramanian, M T Arvind and K Krishna have kindly exte 
their help. Our thanks to them. 
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Abstract. This paper introduces a novel methodology for clustering of sym¬ 
bolic objects by making use of Genetic Algorithms (GAs). GAs are a family of 
computational models inspired by evolution. These algorithms encode poten¬ 
tial solutions to specific problems on simple chromosome-like data structures 
and apply recombination operator? to these structures so as to preserve crit¬ 
ical information. A new type of representation for chromosome structure is 
presented here along with a new method for mutation. The efficacy of the pro¬ 
posed method is examined by application to numeric data of known number of 
classes and also to assertion type of symbolic objects drawn from the domain 
of fat oil, microcomputers, microprocessors and botany. The validity of the 
clusters obtained is examined. 

Keywords. Symbolic clustering; symbolic similarity; symbolic dissimilarity; 
genetic algorithms; path length; spanning length; best spanning length. 


1. Introduction 

In conventional data analysis, objects are taken as numerical vectors. The clustering of 
such objects is achieved by minimizing intra cluster dissimilarity and maximizing inter 
cluster dissimilarity. A good survey of cluster analysis can be found in the literature (Duda 
& Hart 1973; Diday & Simon 1976; Bock 1987; Diday et al 1987; Jain & Dubes 1988; 
Diday 1989). 

Symbolic objects are extensions of classical data types. In conventional data sets, the 
objects are individualized, whereas in symbolic data sets, they are more unified by means 
of relationships. Based on the complexity, the symbolic objects can be of assertion, hoard 
or synthetic type. Some references to clustering of symbolic objects can be found in Diday 
(1990), Gowda & Diday (1991a, 1992). 

A symbolic clustering methodology is proposed in this paper, which makes use of GAs. 
An implementation of a genetic algorithm begins with a population of (typically random) 
chromosomes. One then evaluates these structures and allocates reproductive opportunities 
in such a way that chromosomes which represent a better solution to the target problem 
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BEGIN /* genetic algorithm */ 

Generate initial population 
Compute Fitness of each individual 
WHILE NOT FINISHED Do 
BEGIN /* Produce new generation */ 

For Population Size Do 
BEGIN /* Reproductive Cycle */ 

Select two individuals from old generation 
for mating /* biased in favour of fitter ones */ 

Recombine the two individuals to give two offspring 
Compute Fitness of the two offspring 
Insert offspring in new generation 
END 

If population has converged THEN 
finished = TRUE 

END „ , 

Figure 1. A traditional genetic algo- 
END. . , 

nthm. 

are given more chances to reproduce than those which are poorer solutions. The goodm 
of a solution is typically defined with respect to the cuiTent population. In a broader use 
of the 161111 , a genetic algorithm is any population-based model that uses selection a 
recombination operators to generate new sample points in a search space. The standi 
GA can be represented as shown in figure 1. 


2. Proposed modified similarity and dissimilarity measures 

Similarity between A and B is written as, 

S(A,B) = S(A h Bi) + --- + S(A k ,B k ). ( 

For the jfcth feature, S(Ak, Bk) is defined using the following three components: 

(1) S p (Ak , Bk) due to position p, 

(2) S s (Ak, Bk) due to span s, 

(3) S c (Ak, Bk) due to content c. 

Dissimilarity between A and B is written as, 

D(A,B) = D(A l ,B l ) + --- + D(A k ,B k ). ( 

For the Arth feature, D(Ak, Bk) is defined using the following three components: 

(1) D p ( Ak, Bk) due to position p, 

(2) D s (Ak, Bk) due to span s, 

(3) D c (Ak, Bk) due to content c. 
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2.1 Quantitative interval type of Ak and Bk 
Let 

al and au represent lower and upper limit of interval Ak, 
bl and bu represent lower and upper limit of interval Bk, 
inters = length of intersection of A* and Bk , 

Is — span length of Ak and Bk, 

= | ma x(au, bu) — min (al, bl) |, 

where max() and min() represent maximum and minimum values respectively. The 
similarity and dissimilarity between two samples Ak and Bk is defined on position and 
span. Similarity due to position is defined as 

Sp(Ak , Bk) = sin[(l — (( al — bl)/uk)) x 90]. (2) 

Similarity due to span is defined as 

S s (Ak, B k ) = sin[((Za + lb)/(2 x Is)) x 90], (3) 

where uk denotes the length of the maximum interval of the &th feature and 

la = | au — al |, 
lb —\bu —bl\. 

Net similarity between Ak and Bk is 

Bk) — Sp(Ak, Bk) + S s (Ak, Bk ). (4) 

Dissimilarity due to position is defined as 

D p (A k , B k ) = cos[(l - (0 al - bl)/u k )) x 90]. (5) 

Dissimilarity due to span is defined as 

D s (Ak, B k ) = cos [{(la + lb)/(2 x Is)) x 90]. (6) 

Net dissimilarity between Ak and Bk is 

D(A k , B k ) = D p (A k , B k ) + D s (A k , B k ). (7) 

2.2 Qualitative type of Ak and Bk 

For qualitative type of features the similarity component and dissimilarity component due 
to position are absent. The two components that contribute to similarity and dissimilarity 
are 


r 
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(1) span, 

(2) content. 

Let la = length of A k or number of elements in A k , 
lb = length of Bk or number of elements in B k , 
inters = number of elements common to A* and B k , 


Is = span length of A k and Bk combined, 

= la + lb — inters. 

The similarity component due to span in defined as 

S s (Ak, Bk) = sin [(/<2 + lb)/(2 x Is) x 90], (: 

The similarity component due to content is defined as 

S c (Ak, Bk) = sin [(inters/Is) x 90]. (: 

Net similarity between Ak and Bk is 

S(Ak, Bk) — S s (Ak, Bk) + S c (Ak , Bk). (b 

The dissimilarity component due to span is defined as 

D s (A k , B k ) = cos [((la + lb)/(2 x Is)) x 90]. (1 

The dissimilarity component due to content is defined as, 

D c (Ak, B k ) = cos [(inters/Is) x 90]. (1: 

Net dissimilarity between A k and B k is 

D(A^, B k ) = Ds(A k , B k ) + D c (A k , B k ). (1 


In conventional data analysis, whenever two samples that are merged are to be repr 
sented by a single sample, one of the frequently used methods is the mean of the two as 
single representative. In symbolic data analysis, the concept of composite symbolic obje 
(Gowda & Diday 1991a, 1992) is used. A new method of forming a composite symbol 
object is proposed in which 

a l m =am x ( ni/n ), 
b l m = bm x (n 2 /n), 

where a, b are the lowest and highest values considering n samples, n\ represents tl 
number of samples between a and m, represents the number of samples between 
and m. 

3. Methodology 

The problem here is to find the number of classes and class memberships in a data set < 
N samples. In order to obtain the natural groups, the following method is proposed. Tl 
methodology makes use of the principle of mutation to obtain better solutions. Anoth 
important feature of the proposed method is that it makes use of a new type of representatic 
for chromosome structure. 
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Cross, over 
point 1 


Cross over 
point 2 

i 



Mutated Offspring 

6 3 4 5 7 12 Figure 2. Formation of mutated off- 

•--♦-•-•-•-•-• spring 


Stage 1: Generate C number of random solutions. The solution would contain the samples 
connected in some order. If there are N samples, randomly generate a number between 0 
and N. Let it be Aq. Now generate one more number other than Aq. Let it be Z?o* Connect 
Aq and Bo. Next generate some number between 0 and N other than Ao and Bo- Let it be 
Co. Connect Bo and Co- Repeat this until all the N numbers are included. The solution 
would be in the form of a string (chromosome). 

Compute the dissimilarities between Ao and Bo and hence find the path length. In the 
same way find the path length between Bq and Co and so on. The sum of all the path 
lengths in the string would represent the spanning length. 

Repeat the above procedure for C solutions and hence find the strings and corresponding 
spanning lengths of each solution. 

This would form the initial population. 

Stage 2: Having C random spanning lengths, our aim would be to obtain the best spanning 
length which has the least value. In order to obtain the best spanning length, we select an 
individual from the initial population and mutate it. The mutation is done in two phases. 

Phase 1: The mutation is done by generating two crossover points and rearranging the 
samples in order to obtain a different string structure (chromosome). This is depicted in 
figure 2. 

Phase 2: The mutated offspring is further subjected to mutation as follows: Each sample 
in the string has two links connected to two different samples. Among the two links, the 
one that has the highest dissimilarity value is selected and replaced by a shorter link. This 
process is repeated for all the samples in the string. 

Stage 3: The process of mutation, i.e. stage 2, is repeated till 90% of the initial population 
selected has the same spanning lengths. 

Stage 4: Among the C populations, the one that has the least spanning length is selected 
which would represent the best spanning length. This string structure would represent 
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Table 1. Experimental results of randomly generated classes. 


No. of 
Gaussian 
clusters 
generated 

No. of 
samples 
generated 

Mean values 
used for 
generating 
samples 

Classes 
obtained by 
proposed method 

2 

80 

1,3 

2 

3 

120 

1,3,5 

3 

4 

160 

1,3,5,7 

4 

5 

200 

1,3,5,7,9 

5 

6 

240 

1,3,5,7,9,11 

6 

7 

280 

1,3,5,7,9,11,13 

7 

8 

320 

1,3,5,7,9,11,13,15 

8 


the best solution to the given data. In order to obtain the classes, we adopt the foil 
procedure. 

(1) Compute the dissimilarites between each sample in the string, i.e. path lengths 

(2) Identify the inconsistent length, which is the path length having the highest 
between two samples. Remove the inconsistent length to form two groups. 

(3) Merge all the samples of each group and form two composite symbolic objects 

(4) Compute the similarity S and dissimilarity D between the two groups. If dissiir 
D is greater than similarity S then the inconsistent length is removed so as to re 
two clusters or else the inconsistent length is placed back. 

(5) Repeat steps 1 to 4 until no sample has dissimilarity greater than similarity. 


4. Results of simulation 

In order to corroborate the efficacy of the algorithm, several simulation studies were 
the results of which are given below. The clusterings obtained using the proposed r 
are examined for their validity using Hubert’s T statistics (Jain & Dubes 1988) apf 
In order to compare the results, the validity of the clustering structures obtained b; 
methods is presented in table 11 below. 


Table 2. Description of 2 classes of Iris data. 


Cluster no. Feature 1 Feature 2 Feature 3 Feature 4 


1 

2 ' 


5.47-5.92 2.55-2.73 3.62-4.19 1.13-1.31 

4.83-4.86 3.07-3.09 1.38-1.43 0.28-0.35 
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/ 



Figure 3. Plot of iris data; x-axis - 
feature SW; >’-axis - feature PL. 


Example 1. The first example is such that the input data are of numeric type and the 
output data are symbolic. The objects of numeric type were drawn from a mixture of 
normal distributions with a known number of classes and classification so that the results 
show the efficacy of the algorithm. The test set was drawn from a mixture of C normal 
distributions with mean ra, and covariance matrix C; having individual variances of 0.15 
and zero covariances. The different values of the number of classes and the means chosen 
are shown in table 1 and the test samples were independently generated using a Gaussian 
vector generator. The proposed method was used on this test data set. As indicated in 
table 1, there is perfect agreement between the number of classes used for generating 
Gaussian clusters and the number of classes obtained by the proposed method. In all 
the seven cases, the classification results were in full agreement with the test samples 
generated. 





Table 3. Fat oil data. 





Sample 

Sp. gravity 
(g/cm 3 ) 

m.p. 

(°C) 

Io. value 

Sa. value 

Fatty acids 


Linseed oil 

0.930-0.935 

-27 to -8 

170-204 

118-196 

L,Ln,O.P,M 


Perilla oil 

0.930-0.937 

-5 to —4 

192-208 

188-197 

L,Ln,0,P,S 


Cotton seed 

0.916-0.918 

—6 to — 1 

94-113 

189-198 

L,0,P,M,S 


Sesame oil 

0.920-0.926 

—6 to —4 

104-116 

187-193 

L,0,P,S,A 


Camellia 

0.916-0.917 

-21 to-15 

80-82 

189-193 

L,0 


Olive oil 

0.914-0.919 

0 to 6 

79-90 

187-196 

L,0,P,S 

§ 

Beef tallow 

0.860-0.870 

30 to 38 

40-48 

190-199 

0,P,M,S,C 


Lard 

0.858-0.864 

22 to 32 

53-77 

190-202 

LAP,M,S,Lu 


Abbreviations: Io - ionisation; sa - saponification; L - linoleic acid; Ln - linolenic acid; 
O - oleic acid; P - palmitic acid; M - myristic acid; S - stearic acid; A - arachic acid; C 
- capric acid; Lu - lauric acid 
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Table 4. Description of classes in fat oil data. 


Cluster 

no. Samples 

Sp. 

gravity (0°C) 

(g/cm 3 ) m.p. 

Io. 

value 

Sa. 

value 

Fat 

aci( 

1 

0,1,2,3, 

0.917-0.933 -7.75 to+3 

191.25-196.75 

165.87-183.25 

L,L 


4,5 




0,1 






m,; 

2 

6,7 

0.862-0.864 28.75-34.62 

49.75-60.25 

194.87-195.62 

L,C 






m,: 






Lu, 


Abbreviations: Io - ionisation; sa - saponification; L - linoleic acid; Ln - linolenic acid; 0 - ol 
acid; P - palmitic acid; M - myristic acid; S - stearic acid; A - arachic acid; C - capric acid; 
- lauric acid 


Example 2. The example is so chosen so as to demonstrate the efficacy of the algoritl 
in clustering data belonging to two classes with many overlaps. The data set used is i 
well-known iris data set. This data set contains measurement of two species of iris. Thi 
are 50 patterns for each species and four features, namely petal length, petal width, se; 
length and sepal width. The proposed method was applied to the two classes of iris d 
having 100 samples. The method resulted in two classes in perfect agreement with the d 
set considered. Two symbolic objects representing the two classes are shown in table 
which also gives the descriptive information about the classes. The plot of the two clas 
obtained is as shown in figure 3. 


Table 5. Microcomputer data. 


Microcomputer 

Display 

RAM 

(k) 

ROM 

(k) 

MP 

Keys 

Apple II 

Colour TV 

48 

10 

6502 

52 

Atari 800 

Colour TV 

48 

10 

6502 

57-63 

Commodore VIC 20 

Colour TV 

32 

11-16 

6502A 

64-73 

Exidi Sorcerer 

B&WTV 

48 

4 

Z80 

57-63 

Zenith H8 

Built-in 

64 

1 

8080A 

64-73 

Zenith H89 

Built-in 

64 

8 

Z80 

64-73 

HP-85 

Built-in 

32 

80 

HP 

92 

Horizon 

Terminal 

64 

8 

Z80 

57-63 

Ohio Sc Challenger 

B&WTV 

32 

10 

6502 

53-56 

Ohio Sc II Series 

B&WTV 

48 

10 

6502C 

53-56 

TRS-801 

B&WTV 

48 

12 

Z80 

53-56 

TRS-80 ffl 

Built-in 

48 

14 

Z80 

64-73 
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Table 6. Description of classes in microcomputer data. 


Cluster 

no. 

Samples 

Display 

RAM 

(k) 

ROM 

(k) 

MP 

Keys 

i 

0,1,2,3,4,5, 
7,8,9,10,11 

Colour TV, 
B&WTV 
Terminal 
Built-in 

32,64 

6-10 

6502X 

55-73 

2 

6 

Built-in 

32 

80 

HP 

92 


Table 7. Botanical data of 9 trees of 3 classes. 


Class 1 
(Annonaceae) 

CJass 2 
(Clasiaceae) 

Class 3 

(Mimosaceae) 

0 degjknpCFR 

1 dfhjknpCFR 

2 dehjknpCFR 

3 aehilmoswxzDFR 

4 cfgilmoswxzDFR 

5 cegjlmoswxzDFR 

6 begikmpBJKPR 

7 begiknpBJKPR 

8 begjkmpBJKPR 


Example 3. The data set for this example (table 3) is chosen from the domain of fats and 
oils (Ichino & Yaguchi 1989) having four quantitative features of interval type and one 
nominal qualitative feature. The proposed method resulted in two classes. The samples 
of the classes were {0,1, 2, 3,4, 5} and {6,7}. Table 4 shows the two symbolic objects 
representing the two classes. 

Example 4. The data set of microcomputers (table 5) (Ichino & Yaguchi 1989) is consid¬ 
ered for this experiment. The proposed method resulted in two classes. The samples of the 
classes were {0,1,2, 3,4, 5, 7, 8, 9, 10, 11} and 6. Table 6 shows the description of the 
two classes. 

Example 5. The data for this experiment (table 7) is considered from botany (Gowda & 
Diday 1992). It consists of 9 trees belonging to 3 classes. The proposed method resulted in 
3 classes. The samples of the classes were {0,1,2}, {3,4, 5} and {6,7, 8}. Table 8 shows 
the three symbolic objects representing the three classes. 


Table 8. Description of classes in botanical data. 


Cluster no. 

Description 

Samples in cluster 

i 

degjknphfCFR 

0,1,2 

2 

cfgilmoswxzejahDFR 

3,4,5 

3 

begjkmpinBJKPR 

6,7,8 
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Table 9. Microprocessor data. 


MPU Clock (MHz) Gen. reg*. Inst, (byte) Cache size Cache type 


i386DX 

16-33 

8 

123 

_ 

Null 

i386SX 

12-20 

8 

123 

— 

Null 

i486DX 

25-50 

8 

214 

8192 

Common 

i486SX 

20 

8 

129 

8192 

Common 

68020 

12-33 

8 

99 

256 

Instruction 

68030 

16-50 

8 

105 

512 

Independei 

68040 

25 

8 

140 

8192 

Independei 

MB86901 

20-25 

120 

64 

— 

Null 

MB86930 

20-40 

136 

68 

4096 

Independei 


* General registers 


Example 6. The data set of microprocessors (table 9) (Ichino & Yaguchi 1989) is co 
ered for this experiment. The proposed method resulted in two classes. The sampl 
the classes were {0,1,2,3,4, 5, 6, 8} and {7}. The description of the 2 classes is shot 
table 10. 

As mentioned earlier, for purpose of comparison, results obtained by other method 
given in table 11. 


5. Conclusion 

A new method for clustering symbolic objects is developed using GAs. A new ty 
representation for chromosome structure is presented along with two ways of n 
ing the chromosome structure. Several artificial and real-life data with known nu 
of classes and classification assignments were used to establish the efficacy of the 
posed methodology. Subsequently, the proposed methodology was applied to asst 
type of symbolic data sets drawn from the domain of fat oil, microcomputers, botanj 
microprocessors. 


Table 10. Description of classes in microprocessor data. 


Cluster 

no. 

Samples 

Clock 

(MHz) 

Gen. 

reg. 

Instructions 

(byte) 

Cache 

size 

Cache 

type 

1 

0,1,2,3,4 
5,6,8 

22-28 

8 

123,99,140 

214,129, 

105,68 

8192,256, 

4096,512 

Common, instru( 
Null, independei 

2 

7 

20-25 

120 

64 

— 

Null 
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Table 11. Level of significance values for validating the clusters using Hubert’s T statis¬ 
tics. 


Data 

Proposed 

method 

Gowda & Diday 
(Using similarity) 

Gowda & Diday 
(Using dissimilarity) 

Ichino 

Fat oil 

0.98 

0.98 

0.96 

0.98 

Microcomputer 

0.90 

0.67 

0.67 

0.90 

Microprocessor 

0.79 

— 

— 

0.66 

Botanical 

1.00 

1.00 

1.00 

— 
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Effect of voids on the propagation of waves in an elastic layer 
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Abstract. The present paper investigates the propagation of waves in an elas¬ 
tic layer containing voids. Numerical calculations and discussions indicate that 
the velocity of the propagation of waves decreases due to the presence of voids 
in the material medium of the layer and the voids cause dispersion of the general 
waveform. 

Keywords. Propagation of waves; distribution of voids; surface stress; vol¬ 
ume fraction field; wave velocity equation; surface waves. 


1. Introduction 

Recently, the theory of elasticity concerning the solid elastic material consisting of a dis¬ 
tribution of various pores, generally known as voids or vacuous pores, is receiving greater 
attention due to its theoretical and practical relevance. The general theory in this respect 
has been formulated by Nunziato and Cowin (Nunziato & Cowin 1979; Cowin & Nunziato 
1983). They also formulated the linearised version of the above theory (Cowin & Nunziato 
1983) where the voids have been included as an additional kinematic variable. This theory 
reduces to the classical theory of elasticity in the limiting case when the void-volume van¬ 
ishes. This new theory can play an important role in practical problems of geological and 
synthetic porous media where the classical theory is inadequate. Some basic theorems and 
a brief account of the theory on voids have been introduced by Iesan (1985) and Cowin 
(1984) respectively. Cowin (1984) presented the inter-relationship between this theory of 
voids and other theories of elasticity. The uniqueness theorem in the theory of elastic ma¬ 
terial with voids has been presented by Chandrasekharaiah (1987b). He investigated plane 
waves in a rotating elastic solid with voids (Chandrasekharaiah 1987c). The effect <~ c -— 
face stresses and voids on Rayleigh waves in an elastic medium was also invests 
him (Chandrasekharaiah 1987c). Following the above theory, an attempt has beer 
this paper to carry out a thorough investigation of the propagation of waves and v 
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in an isotropic, homogeneous, elastic solid layer containing a distribution of voids. The 
authors believe that the problem in its present form has not been discussed so far. In the 
present investigation, the results obtained are in agreement with the corresponding classical 
results when the parameter for the void character of the material medium tends to be zero. 


2. Formulation of the problem and boundary conditions 


Let us introduce a rectangular Cartesian frame of reference Ox 1 x 2 x 3 in the middle plane 
of the elastic layer. We consider the effect of voids on the propagation of waves in an elastic 
layer of thickness 2 h. The planes bounding the layer X 3 = ±h are supposed to be free of 
stresses. There exist plane waves moving with a constant velocity c in the positive direction 
of the x\ axis. Both the longitudinal and transverse waves in the infinitely extended layer 
would be propagated. It is evident that the boundary surfaces of the elastic space lead to a 
distortion of the state of stress which also influences the velocity of propagation of elastic 
waves. Considering the nature of the problem we may take u\ and M 3 as the non-zero 
components of the displacement u at any point and they may be expressed in the form 


Ml 

M 3 


3P 3g 
3 xi 3x3’ 


3 P 3 Q 
3x3 9*1 ’ 


( 1 ) 


where P and Q are displacement potentials which are functions of coordinates xi, X 3 anc 
time t. The dynamical equations of motion (Nunziato & Cowin 1979; Cowin & Nunziatc 
1983; Chandrasekharaiah 1987a) are 


/xV 2 u + (ti + A)VV • u + j 8 V<t> = 3 2 u/3f 2 , 


aV 2 <P - £d> - to— - 0V•u = pkd 2 (p/dt 2 . 
3 1 


( 2 : 

0: 


<'p is volume-fraction field; X,pu are Lame elastic constants, p is the mass density; a, f3,%, a 
and k are new material constants characterizing the presence of voids. 

For a plane deformation parallel to the X 1 X 3 plane we take 


u = (mi, 0, M3). 


( 4 : 


From (1), ( 2 ) and (4), we get the following differential equations 

P 



A. ~b 2/i 


<D, 


2 = 0 . 


(5 

(6 


Eliminating <I> from (3) and (5) we obtain 


1 3 2 


- 

1 a 2 dt 2 


V 2 — 


1 


cr 


' 3 3 2 ' 

. 1 + W *5 + *V 


+ j6*V 2 


P = 0 , (7 
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where 


2 k 4- 2p 7 ^ w 

a z =- n t b z = ~, a = —, CD * = -, 

P P £ I 

K* = —, B* = ---. (8) 

§ ^ a(X + 2/i) 

In presence of voids the stress tensor Oij obeys the following law (Nunziato & Cowin 
1979) 

<?ij = MijUk'k + n(uij + ujj ) + jSSi/O. (9) 

We seek solutions of (5) and ( 6 ) subject to the boundary conditions 

<733 = 031 = 0 , on planes 73 = ±/z. ( 10 ) 

Following Nunziato & Cowin (1979) we take the boundary condition due to the void nature 
of the material medium as 

90/9x3 = 0 , for x 3 = ±h. ( 10 a) 

Now using (1), (4), (5) in (9) we get 

d 2 P d 2 Q d 2 Q~ 

<731 = p 2 -—-- —j + T~T ’ 

3xi 3 x 3 8 x 3 dxf 

„ L 3 2 e a 2 / 5 !, hb 2 p 

33 ~ 2/X 2 dxidxs d 7 + b 2 9/2 ' (11) 


,2 _ P * _ 01 * 


" . 
w 


3. Method of solution 

To solve ( 6 ) and (7) we take P and Q in the following forms 

[P, Q] = [P(x 3 ), Q(x 3 )] exp i( tjx\ - (ft), ( 12 ) 

where £ is the angular frequency which is a real constant in our problem and P(xs), Q (X 3 ) 
are functions of X 3 , r? is an unknown complex constant and i = •>/—I. 

Introducing (12) in ( 6 ) and (7) we get the following differential equations 
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Solutions of (13) and (14) are taken in the following form 

T — A\ sinh 777711x3 + A 2 cosh 77 m 1x3 + A3 sinh 777712X3 + A4 cosh 777712x3, 
<2 = A 5 sinh 777723x3 + A(, cosh 777723X3, 


where m\ and m 2 are the roots with positive real parts of the equation 


(1 - m 2 ) 2 


1 (1 - if&>* - ; 2 K*) + 

X\ L 


rj 2 a* 


(1 — m 2 ) 


— (l-iSeo* -S 2 K*) = 0, 


(15) 


m3 = (1-^)2. 

Ai, A 2 , A 3 , A 4 , A 5 , Ag are arbitrary constants and 



Applying the boundary conditions (10) and (10a) one obtains the following: 

2/m 1^1 Ai + 2im\p\A 2 + 2im 2 q 2 A 3 4- 2im 2 p 2 A A - (1 + m 2 )p 3 A 5 

- (1 + m 2 )#3A6 = 0, 

2z'mi?iAi — 2imipiA2 + 2im 2 q 2 A 3 — 2im 2 p 2 A^ + (1 + m 2 )p 3 As 

- (1 + m 2 )^ 3 A6 = 0, 

(2 - i 2 )pi Ai + (2 - s 2 )q\A 2 + (2 - s 2 )p 2 A 3 + (2 - s 2 )q 2 A^ 

+ 2im 3 q 3 A 3 + 2im 3 p 3 A( s = 0, 

(2 - s 2 )p\A\ - (2 - s 2 )<?i A 2 + (2 - s 2 )p 2 A 3 - (2 - s 2 )q 2 A 4 

- 2im 3 q 3 A 3 4- 2im 3 p 3 A( s = 0, 

mi«i^iAi + m\n\piA 2 + m 2 n 2 q 2 A 3 + m 2 n 2 p 2 A^ = 0, 
m\n\q\A\ — m\n\p\A 2 4- m 2 n 2 q 2 A 3 — m 2 n 2 p 2 A 4 = 0, 


(16) 

(17) 


(18) 


where 


pj = sinh rjnijh. 

qj = cosh qmjh, j = 1, 2, 3, 

m = m 2 + r 2 — 1, 

n 2 =m 2 + r 2 - 1. (19) 

Elimination of the constants in (18) gives 

A = detfoy] = 0, i,j = 1,2, 3,4, 5,6, (20) 

where 


an =2im\q\, a\ 2 = 2im\p\, a\ 3 = 2im 2 q 2 , an = 2im 2 p 2 , 
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ais = -(1 + m 2 )p 3 , ai6 = -(1 + m 2 )q3, 

a2i = 2 im\qi, 022 = - 2 im\p\, 023 = 2 im 2 qi, <*24 — ~ 2 im 2 P 2 , 
a 25 = (1 + m])p 2 , a 2 6 = -(1 + 

«3l =( 2 -s 2 )pu q32 = ( 2 -s 2 )qu 033 = (2 - s 2 )p 2 , 

034 = (2 - s 2 )q 2 , 035 = 2 im 3 q 3 , a 36 = 21m 3 p 3 , 

<341 = (2 - s 2 )pi, a 4 2 = -(2 - i 2 )<?i, <243 = (2 - 5 2 )j32, 

<344 = -(2 - «45 = - 2 im 3 q 3 , a 4 6 = 2 int 3 P 3 , 

<351 <252 = <353 = m2«2<72, «54 = W2«2P2, 

055 = 0, <356 — 0, 

<36i =mi«i^i, a62 = -minipi, <263 = m 2 n2q2, <*64 = -»22«2P2, 

<*65 = 0, 066 = 0. 

Equation (20) represents the wave velocity equation for surface waves in an elastic layer 
with voids. This equation contains c and q as only unknown quantities and hence c can 
be expressed as a function of q indicating the dispersive nature of waves considered. 
This dispersive nature of the general waveform arises due to the presence of voids in the 
material medium. The above sixth-order determinant A can be expressed as the product 
of two third-order determinants as follows: 


where 


and 


A = Aj • A 2 , 


Ai 


m \ n \ , , 

-tanh qm \ h tanh qm 2 h 

n*2 n 2 

2m 1 tanh qm\h 2 m 2 tanh qm 2 h 

2 -s 2 2 -s 2 


0 

(1 4- m 2 ) tanh ijm 3 /i 

2 m 3 


A 2 = 


(2 


m\n\ 
m 2^2 
2m 1 

s 2 ) tanh rjm\h 


1 0 

2 m 2 1 + m 2 

(2 — s 2 ) tanh rjm 2 h 2m 3 tanh r]m^h 


Hence (20) implies either Ai = 0 or A 2 = 0. 

We now discuss each of the above cases separately as follows 


Case A (Ai =0): After simplification, Ai = 0 gives 
a/ m\n\ tanh r)m\h A ff 
m 2^2 tanh rjm 2 h 



2 m 1 tanh r)m\h (1 + m 2 ) tanh rjm^h 
2 — s 2 2 m 3 


( 21 ) 


where 
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at 


2 m 2 tanh r]m 2 h 
2 -s 2 


(1 + m^) tanh r/m^h 

2 m 3 


Case Al: If the length of the wave is large in comparison with the thickness of the layer 
2h, the hyperbolic tangents can be replaced by their arguments. So (21) becomes 

4R 2 V* 2 - (2 -s 2 ) 2 = 0, 


( 22 ) 


where 


(23) 


R 2 = 1 - r 2 , R\ = (mi//?) 2 , 

R\ = ( m 2 /R ) 2 , V* 2 = /? 2 /? 2 /(/? 2 + R 2 - 1). 

Equation ( 22 ) determines the wave velocity of plane waves and corresponds to results 
similar to those obtained by Rayleigh (1889) and Lamb (1916) in an elastic layer containing 
some voids. When the medium is free from voids we have m\ = R, R\ = 1, V* = 1 
and we get the classical results of Rayleigh (1889) and Lamb (1916). Thus we note that 
wave velocity due to Rayleigh and Lamb in presence of voids may be obtained from the 
corresponding classical form by replacing R by R V * where V* is given by (23). For small 
frequency waves we ignore higher degree terms in f (Chandrasekharaiah 1987a). In view 
of this approximation and with the help of ( 8 ), (15) and (17), (22) becomes 

(24) 


4Vq - (2 - s 2 ) 2 = 0, 


where 


and 


Vo = [l 


{c 2 /a 2 (1 


AO)] 2 


(25) 


N = a* 0* =fi 2 /[%(*■ + 


(26) 


Case A2: If the length of the wave is very small in comparison with the thickness of the 
layer 2 h, we may assume that the ratio of hyperbolic tangents in ( 21 ) approaches unity 
and hence ( 21 ) becomes 

4 m 3 RR* - (2 - s 2 ) 2 = 0, (27) 

where 

/? = (1 -r 2 ) 2 , /?! = mi//?, 

R 2 = m 2 /R, /?* = RiR 2 (R\ + R 2 )/[R 2 + Rl + RiR 2 - !]• (28) 

Equation (27) determines the velocity of Rayleigh surface waves in an elastic layer with 
voids. For small frequency waves, which play a great role in analysing motions caused by 
earthquakes and explosions, we neglect the higher degree terms in f (Chandrasekharaiah 
1987a). With the use of this approximation equation (27) transforms to 

4m3/?o — (2 — s 2 ) 2 = 0, (29) 

where 

Rq = [1 — (c 2 /a 2 ( 1 - /V))]2, N = /S 2 /[£(A + 


( 30 ) 


Effect of voids on the propagation of waves 


483 


Case B (A 2 = 0): On simplification, A 2 = 0 gives 


. „ 2x^ tanh 77/77 1A 

4m\ni$ — (1 + m^)(2 — s z ) 


m\n\ 
m2 n 2 L 


tanh rjm^h 

4 m 2 mi — (1 + m3) (2 — s 2 ) 


tanh ?]m 2 ^ 
tanh r]m 2 ,h 


(31) 


fii: If the length of the wave is large in comparison with thickness of the layer, the 
hyperbolic tangents can be replaced by the first two terms of their expansions into series 
and hence (31) becomes 

4m\m^ — (1 + ra 2 )( 2 — s 2 )[mi(l — ^ 2 /z 2 m 2 )/m 3 (l — \rj 2 h 2 m\)'] 

9 9 

= — (1 + m3)(2 — 5 ) 

x [«J 2(1 - jrj 2 h 2 m 2 )/m 3 (l - ^rj 2 h 2 m 2 )]}. (32) 

Equation (32) may be regarded as the revised form of the classical result obtained by 
Rayleigh (1889) and Lamb (1916) in an elastic layer with voids. If the layer is free from 
voids (0 = 0 ), (32) simplifies to the form 

c 2 /b 2 = (4/3)?T/i 2 (l - (b 2 /a 2 )) 

which is the classical result of Rayleigh (1889) and Lamb (1916). 


Case B2: If the length of the wave is small in comparison with the thickness of the layer, 
the ratio of hyperbolic tangents in (31) may be approximated to unity and (31) reduces 
to (27) which determines the velocity of Rayleigh surface waves in an elastic layer with 
voids. 


4. Numerical results 
From (24) we obtain 

s = 2{l-l/[(a 2 /& 2 )(l-A0]}2. 


Table 1. Values of s for case A1. 


(a/b) 2 

N 

2.3710 

2.4758 

2.5806 

2.6854 

2.7902 

2.8950 

2.9980 

3.5 

4.3 

0 

1.5208 

1.5441 

1.5652 

1.5844 

1.6020 

1.6181 

1.6327 

1.6903 

1.7521 

0.2000 

1.3752 

1.4073 

1.4361 

1.4622 

1.4859 

1.5076 

1.5272 

1.6036 

1.6844 

0.3000 

1.2609 

1.3008 

1.3363 

1.3682 

1.3972 

1.4234 

1.4471 

1.5386 

1.6343 

0.4000 

1.0901 

1.1434 

1.1902 

1.2318 

1.2691 

1.3028 

1.3328 

1.4475 

1.5651 

0.5000 

0.7911 

0.8768 

0.9487 

1.0104 

1.0643 

1.1120 

1.1539 

1.3093 

1.4627 
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Table 2. Values of s for case A2. 


(a/b) 2 

N 

2.3710 

2.4758 

2.5806 

2.6854 

2.7902 

2.8950 

2.9980 

3.5 

0 

0.8996 

0.9042 

0.9082 

0.9116 

0.9145 

0.9171 

0.9194 

0.9274 

0.2000 

0.8627 

0.8721 

0.8799 

0.8865 

0.8920 

0.8968 

0.9009 

0.9142 

0.3000 

0.8221 

0.8375 

0.8501 

0.8605 

0.8692 

0.8766 

0.8827 

0.9030 

0.4000 

0.7406 

0.7686 

0.7913 

0.8100 

0.8254 

0.8383 

0.8489 

0.8797 

0.5000 

0.5547 

0.6115 

0.6574 

0.6951 

0.7263 

0.7524 

0.7739 

0.8406 


Values of 5 for different values of ( a/b) 2 and N for case Al are shown in table 
It is observed from table 1 that the wave velocity decreases with the increase 
of N for a particular value of (a/b) 2 . We further note that for a particular value 
wave velocity increases with the increase of (a/b) 2 . 

Again, from (29) one obtains 

s 6 - 8s 4 + {24 - l6/[(a 2 /b 2 )(l - A0]}s 2 -{16 - 16/[(a 2 /fc 2 )(l- 

Values of s for different values of N and (a/b) 2 for case A2 are shown in table 
From (22), (27) and (32) we see that the wave velocity equation contains c 
the only unknown quantities and hence c can be expressed as a function of rj in 
indicating the dispersive nature of the waves. 

Table 2 reveals that the Rayleigh wave velocity in the presence of voids in 
layer decreases when the value of N increases for a particular value of (a/b) 2 .. 
particular value of N, the Rayleigh wave velocity increases with the increase of 
(a/b) 2 . 

Similar computations may be made and conclusions drawn for the cases in l 
5. Conclusions 

The most significant outcome of the paper is that voids modulate the surface 
reducing their speed as well as by causing dispersion. 


The authors are very grateful to the reviewer for his/her valuable comments and si 
towards the improvement of this paper. 
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Abstract. A simulation-optimization procedure is presented for evaluating 
the extent of interbasin transfer of water in the Peninsular Indian river system 
consisting of 15 reservoirs on four river basins. A system-dependent simulation 
model is developed incorporating the concept of reservoir zoning to facilitate 
releases and transfers. The simulation model generates a larger number of so¬ 
lutions which are then screened by the optimization model. The Box complex 
nonlinear programming algorithm is used for the optimization. The perfor¬ 
mance of the system is evaluated through simulation with the optimal reser¬ 
voir zones with respect to four indices, reliability, resiliency, vulnerability and 
deficit ratio. The results indicate that by operating the system of 15 reservoirs 
as a single unit the existing utilization of water may be increased significantly. 

Keywords. Reservoir operation; simulation; optimization; reliability. 

1. Introduction 

The distribution of water resources is, in general, uneven in most countries. In India, the 
distribution is uneven both in time and space. Rainfall, which is the prime source of water 
in India, is mostly confined to the four monsoon months of June to September. The eight 
non-monsoon months receive less than 10 % of the annual rainfall, as a result of which many 
parts of the country experience a scarcity of water during these months. The distribution 
of water over space is also uneven, with about 64% of the total water concentrated in the 
Himalayan river basins of Ganga, Indus and Brahmaputra. It is estimated that because of 
this uneven distribution, one-third of the country is drought-prone while about one-eighth 
of the country is flood-prone. To enhance the utilization of water resources through bet¬ 
ter distribution, the Government of India proposed the National Perspective Plan (NPP) 
for water resources development, consisting of two components, the Himalayan River 
Development and the Peninsular River Development (Ministry of Irrigation 1980). In 
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the present study, interbasin transfer of water over a major part of the peninsular rive 
component is analysed. The system considered for analysis consists of 15 reservoirs o 
four rivers, Godavari, Krishna, Pennar and Cauvery, as shown schematically in figure 1 
The reservoirs considered in the configuration are the Nizamsagar (NZS), Sreeramsaga 
(SRSP), Inchampalli (IC) and Polavaram (POL) reservoirs on the Godavari; the Tungab 
hadra (TB), Sunkesula (SA), Srisailam (SS), Nagarjunasagar (NS) and Pulichintala (PC 
reservoirs and the Prakasam Barrage (PB) on the river Krishna; the Mylavaram (MYL) an 
Somasila (SMS) reservoirs on the Pennar river and the Krishnarajasagar (KRS), Merit 
reservoirs (MET) and Upper Anicut (UA) on the Cauvery. For convenience in presenri 
tion, a particular reservoir is also referred by the node notation (m, n) where m is th 
basin number (with m = 1 for the Godavari Basin and m — 4 for the Cauvery Basil 
figure 1) and n is the position of the reservoir in the basin, n = 1 for the upstream-mo: 
reservoir in that basin. For example, the reservoir (2, 3) denotes the Srisailam Reservo: 
in the Krishna Basin. Some salient features of the reservoirs are presented in table 1. Oi 
of the 15 reservoirs, Inchampalli, Polavaram and Pulichintala are proposed reservoir: 
Even though Sunkesula, Prakasam Barrage and Upper Anicut are only barrages, for th 
sake of computational simplicity they are also considered as reservoirs with negligibl 
storage. All the reservoirs supply water for irrigation while some reservoirs, includin 
the Nizamsagar, Sreeramsagar, Inchampalli, Tungabhadra, Srisailam, Nagarjunasagar an 
Mettur have power generation plants as well. The transfer links considered in the system ai 


Optimal operation of a multibasin reservoir system 


489 


Inchampalli-Nagarjunasagar Dam, Polavaram-Prakasam Barrage, Srisailam-Mylavaram, 
Nagarjunasagar Dam-Somasila and Mylavaram-Upper Anicut. It is expected that excess 
water of the Godavari will be transferred to the Krishna through the first two links. These 
transfers will take care of some of the irrigation demands of the Nagarjunasagar and Pulich- 
intala dams and the delta demands of the Krishna Basin at Prakasam Barrage. This enables 
the water saved at Srisailam to be transferred to Pennar and Cauvery through the last three 
links. Thus, the operation of Inchampalli, Polavaram, Srisailam and Nagarjunasagar is 
considered more significant compared to that of the other reservoirs. 

The transfer of water among the reservoirs is by gravity, except in the case of Inchampalli- 
Nagarjunasagar link where a lift of the order of 100 m is required. In spite of this huge lift, 
this link is considered an important component of the system because the high inflows join¬ 
ing Inchampalli can be diverted to meet the large irrigation demands at the Nagarjunasagar 


Table 1 . Salient features of the reservoirs of the system. 


Reservoir 

(status*) 

Location 

Catchment 

area 

(xlO 3 km 2 ) 

Command 

area 

(xlO 4 ha) 

Power 

generated 

(MW) 

Live 

storage 

(Mm 3 ) 

Dead 

storage 

(Mm 3 ) 

Period 
of inflow 

record 

NZS (E) 

76°i5'E 
18° 10' N 

21.7 

11.13 

15 

780 

60 

1944-86 

SRSP (E) 

78°3(y E 
18°55'N 

40.5 

67.14 

36 

2320 

850 

1963-83 

IC(P) 

80°25' E 
18°37' N 

42.7 

63.58 

975 

4286 

6089 

1950-75 

POL (P) 

81°46 / E 
17°13'N 

37.6 

29.14 

— 

2130 

3381 

1966-86 

TB (E) 

76° 18' E 
15°16'N 

28.8 

34.80 

117 

3307 

457 

1951-85 

SA (E) 

77°45' E 
15°48 / N 

36.5 

11.30 

— 

— 

— 

1966-87 

SS (E) 

78°54' E 
16°5' N 

NA 

NA 

770 

7065 

3049 

1964-86 

NS (E) 

79°36' E 
16°45' N 

10.0 

13.36 

110 

6940 

4610 

— 

PC(P) 

80°3' E 
16°46' N 

19.5 

NA 

— 

1026 

270 

1945-81 

PB (E) 

80°55' E 
16°35 / N 

16.6 

48.56 

— 

— 

— 

1945-81 

MYL (E) 

78°20' E 
14°51'N 

19.2 

1.95 

— 

266 

17 

1969-86 

SMS (E) 

79° 18' E 
14°29' N 

29.4 

16.39 

— 

1994 

214 

1929-82 


Continued 
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Table 1. Continued. 


Reservoir 

(status*) 

Location 

Catchment 

area 

(x 10 3 km 2 ) 

Command 

area 

(xl0 4 ha) 

Power 

generated 

(MW) 

Live 

storage 

(Mm 3 ) 

Dead 

storage 

(Mm 3 ) 

Period 
of inflow 
record 

KRS (E) 

76°31 / E 
12°25' N 

10.6 

11.36 

— 

1172 

125 

1934-86 

MET (E) 

77°55 / E 
H°55'N 

NA 

12.14 

200 

2647 

553 

1966-87 

UA (E) 

78°50' E 
10°50' N 

NA 

44.52 

— 

— 

— 

1966-87 


*Status: (E): Existing (P): Proposed 


Dam. Apart from the transfer links proposed in the configuration, the demands of the exist¬ 
ing century-old link between Krishna and Pennar called the Kumool-Cuddappah canal are 
also protected. The historic data of monthly inflows, salient features and demands at all the 
reservoirs are obtained from the Central Water Commission and various State Government 
agencies. 

In modelling complex multireservoir systems, one methodology often employed is 
screening of the potential alternatives first by an optimization model and evaluating the per¬ 
formance of the system with these alternatives in detail by a simulation model (e.g. Joeres 
et al 1971; Jacoby & Loucks 1972; Chaturvedi & Srivastava 1981). For a complex water 
resources system, such as the one considered in this study, it is practically impossible to 
represent all the system features in an optimisation model. On the other hand, simplifying 
the formulation to make the problem computationally tractable can lead to planning errors. 
It is therefore necessary to carry out an initial simulation to reduce the size of the optimiza¬ 
tion model. Some examples of such studies may be found in the literature (Sigvaldason 
1976; Chung & Helweg 1985; Simonovic 1987; Razavian etal 1990; Kuo et al 1990). 

In the present study, the concept of reservoir zoning (Beard 1967; Sigvaldason 1976) 
is adopted for identifying the limits for releases and interbasin transfers. Each reservoir is 
divided into four storage zones as shown in figure 2. The four storage zones are minimum 
storage (SminL maximum storage (SmaxX releasable storage (Srel) and transferable 
storage (S T ra)- 



$MIN : dead storage capacity 
Sr EL : storage capacity above which release 
: to downstream reservoirs is possible; 
no transfer 

Stra • storage capacity above which both 
release and transfer are possible 
SmaX : maximum live storage capacity. 
MDDL: minimum draw down level. 

Figure 2. Reservoir zoning. 
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The objectives of the systems analysis carried out in the study are: (i) to delineate the 
different storage zones at each reservoir, (ii) to examine the potential of the transfer links 
envisaged in meeting the existing demands in the command areas of the reservoirs and (iii) 
to quantify the extent to which the water availability can be increased at some important 
reservoirs through an optimal operation of the system. This study is carried out in two 
stages: In the first stage, a detailed simulation model is developed and a large number of 
solutions are generated. The sensitivity of the system performance to changes in priorities, 
storage zone levels, demands and operational strategies is examined in this stage, and ranges 
of the different parameters for which the system performance is sensitive are identified. 
This stage generates a huge database to supply some of the inputs required in the second 
stage. In the second stage, a nonlinear optimization problem is solved to identify the best 
solution within the range identified in the first stage for each parameter. The solution of 
the optimization model specifies the zone levels at each reservoir, the extent to which the 
water availability can be raised at a reservoir and the extent of possible interbasin transfers. 
Details of the two models are discussed in the following sections. 


2. Simulation model 

The simulation model generates a large number of solutions corresponding to various 
levels of the four storage zones. The significance of these zones for the operation of the 
reservoirs is as follows. The minimum storage (Smin) * s the dead storage capacity and 
the maximum storage (Smax) is the live storage capacity of a reservoir. Both these zones 
(Smin and Smax) are known constants for every reservoir. The other two storage zones, 
the releasable storage (S rel ) and the transferable storage (Stra) facilitate releases to 
downstream reservoirs and transfers to reservoirs of other basins respectively. By definition, 
if the storage at a reservoir, after satisfying its own demands in a period, is more than Srel, 
then the excess water over Srel can be released to meet the deficits of the downstream 
reservoirs of the same basin. Similarly, after meeting the basin requirements in a period, if 
the storage is more than Stra, then the excess water over Stra can be transferred, if a link 
exists, to meet the deficits of reservoirs of the other basins. There are, thus, three purposes 
for which water from a reservoir in the system can be utilized. In order of priority, they 
are (a) to meet the demands from the command area of the reservoir itself, (b) to meet 
completely or partially the demands of the immediate downstream reservoir in the same 
river basin, and (c) to meet completely or partially the demands at a reservoir in another 
basin through transfer links. In this study a ‘diversion’ from a reservoir is defined as the 
water supplied to meet its own demands, a ‘release’ as the water supplied to meet the deficits 
at the downstream reservoirs of the same basin, and a ‘transfer’ as the water supplied to 
meet the deficits at the reservoirs of other basins. 

The aim of the simulation model is to examine the performance of the system for several 
alternatives of storage zones and to identify an initial value and a range for each parameter 
for use in optimization subsequently. The flow chart of the model is given in figure 3. 
At the beginning of the period, the deficits, if any, are determined at each reservoir after 
accounting for diversion. The deficit at a reservoir is reduced or eliminated completely 
either by release from an upstream reservoir or through transfer from another basin (if 
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Figure 3. Flow chart of the simulation model. 
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such a link exists), or both. The amounts of release and transfer are decided based on the 
available storage in a reservoir after meeting the demand from its own catchment. Based 
on the priorities listed earlier, release and transfer policies are formulated. The releases 
and transfers are made till either the available excess water is exhausted or there is no need 
for any more release or transfer. 

2.1 Release and transfer policies 


The release policy aims at the minimisation of spills out of the system. The downstream 
reservoirs are depleted first before withdrawing water from upstream reservoirs. The re¬ 
lease policy is invoked at a reservoir M, when the available storage, after accounting for 
diversions to meet the demands at the reservoir itself, is more than Srel* Release /if 1,1 " 
from the reservoir M to a downstream reservoir L, if exists, is given by 


= Min 
= 0 , 


‘S rf.t. J’ if iSf 1 > ‘S rfi ji and 

DEF{- - S ( L , 5 ( l < DEFjK 
otherwise, 


( 1 ) 


where is the storage at reservoir M during period t after accounting for its own demand 
<i f M , releases and transfers committed to the reservoir M from other reservoirs and release 
commitments made from the reservoir M to other reservoirs downstream of M. S f L is the 
storage at the reservoir L after accounting for all transfers and releases from other reservoirs 
downstream of M, committed to it during the period and S^ EL ■ is the releasable storage 
for the reservoir M in season j to which the period t belongs. In this study a year is divided 
into two seasons, the monsoon season (J = 1), comprising months June to November 
and the non-monsoon season (j = 2), comprising months December to May. When at 
reservoir M, the releases from the reservoir are computed for all the downstream reservoirs 
starting with reservoir M + 1 and proceeding downstream till either the releasable amount 
of water is exhausted at reservoir M or the demands at all reservoirs of the basin are met. 
For example, if for the reservoir (1,2) in figure 1, excess water is available and if an initial 
deficit exists at (1, 4) then this deficit is met by a release from (1, 3) if possible and a 
release from (1,2) is made only if the release from (1, 3) fails to meet the deficit at (1, 2). 

The transfer policy is similar to the release policy. The deficit at a reservoir, after ac¬ 
counting for diversion from the particular reservoir itself and releases from reservoirs in 
its own basin, is met either partially or fully by transfer from reservoirs of other basins if 
a transfer link exists. The amount of water transferred, T t M ' P , from reservoir M of a basin 
to reservoir P of another basin in period t, when a transfer link exists, is given by. 


V 


M,P 


Min 


= 0 , 


oM 

d 2t 


cM 

°TRA J’ 


if 5“ 


cM 

°TRA,r 


and 


DEFf - Sf, Sf < DEFf 


otherwise. 


( 2 ) 


where is the storage available at reservoir M after accounting for the diversion and 
release, . is the transferable storage in reservoir M for the season j to which the period 

t belongs, Sf is the storage at the reservoir P after accounting for diversion and release 
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and transfers committed to it for the period (by other reservoirs during the computations 
prior to those for reservoir M), and DEFf is the deficit at the reservoir P in period t, 
corresponding to the storage Sf. 

At any reservoir if a deficit exists even after the promised releases from reservoirs of 
the same basin, the deficit is met partially or completely with transfer from other basins. 
This transfer is not necessarily through a direct link between the two reservoirs and may 
be routed through other reservoirs. For example, if reservoir (2, 6) is under deficit, then, 
in simulation, a transfer is first tried from (1,4) to meet the deficits at (2, 6) and, only if 
this transfer does not meet the deficits completely, is a transfer made from (1,3), routed 
either through (2,4) or through (1,4) or both. Also, the hierarchy of the downstream-most 
reservoir to the upstream-most reservoir is maintained while examining the transfers; that 
is, a transfer from an upstream reservoir is considered only if there is no transfer link from 
a downstream reservoir, or if the transfer from the downstream reservoir is inadequate. 
Appropriate loss coefficients are incorporated for diversion, release and transfers from a 
reservoir. 

A major objective of the interbasin transfers in this study is to examine the extent to 
which the existing utilizations could be raised. Two parameters, INCR1 for the monsoon 
season (June-November) and INCR2 for the non-monsoon season (December-May), are 
introduced as multiplying factors to the irrigation demands in the corresponding periods. 
The power generation demands are kept at their existing level. Thus, the extent to which 
the command areas can be increased at every reservoir as a result of the interbasin transfer 
is evaluated. The information on the extent of additional land that can be brought under 
irrigation is not available. The present study is carried out in the absence of such information 
and the INCR1 and ENCR2 parameters are increased on the basis of their effect on the 
performance of the system as a whole. In case of reservoirs on the Godavari and the 
Krishna basins, however, both these parameters are restricted to 2.0 as the command areas 
are generally well developed at these reservoirs. 

A sensitivity analysis with 15 (one per reservoir) each of INCR1,INCR2, Srel.i, 5 rft ?, 
Stra, l 5 tra,2 parameters is carried out to identify the most productive range for each 

parameter and to evaluate the performance of the system under various alternatives. The 
existing demands in both the periods are protected and care is taken to see that, as far as 
possible, the deficits at any reservoir in any period are below or in the vicinity of 10% of 
the total demands. If there is a conflict between water for irrigation and power generation, 
diversions for irrigation are given preference as all the reservoirs are primarily operated 
for irrigation. The performance of the system with different alternatives are compared with 
an objective function which maximises the utilization of resources while penalising the 
deficits. In the case of a trade-off among different reservoirs, the reservoirs on Cauvery 
and Pennar, for example, are given preference as the command areas are underdeveloped 
in these basins. 

Statistical analysis carried out to fit theoretical probability distributions to the historic 
data at the reservoir sites reveal that in 149 of the 168 data sets (corresponding to 14 
reservoirs and 12 months) the log normal distribution fits the data set reasonably well. For 
the purpose of simulation, synthetic streamflows generated for a longer period with loga¬ 
rithmic transformed data are used. The Thomas-Fiering model, incorporating corrections 
suggested by Matalas (1967) is used for synthetic generation. A limitation of this model is 
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fable 2. Prominent transfers for 50-year simulation analysis. 

Jnits: million cubic metres; figures in brackets indicate the number of times the transfer has taken 
)lace in a 50-year period 


Month 

From 

[nchampally 

Polavaram 

Srisailam 

To NS 

PC 

PB 

PB 

MYL SMS 

UA 

Jun 

— 

257(4) 

635(5) 

5302(18) 

223(15) 48(2) 

18259(30) 

Jul 

4216(14) 13204(29) 

580(3) 

17941(29) 

1951(41) 1967(13) 

102362(49) 

Aug 

39495(27) 

3470(10) 

432(1) 

13510(22) 

1886(44) 3973(13) 

81759(46) 

Sep 

13110(14) 

3626(12) 

1126(3) 

25430(43) 

1026(23) 1608(7) 

37612(44) 

Oct 

— 

937(3) 

4296(10) 

19605(35) 

585(15) 1079(6) 

— 

Nov 

— 

210(3) 

5886(20) 

9421(27) 

511(13) 405(3) 

— 

Dec 

— 

702(6) 

2644(18) 

2010(13) 

490(14) 917(6) 

— 

Jan 

— 

— 

— 

600(8) 

702(18) 732(7) 

11842(43) 

Feb 

— 

— 

— 

544(8) 

578(19) 1098(9) 

28686(42) 

Mar 

— 

— 

— 

2293(19) 

661(18) 768(7) 

34481(38) 

Apr 

— 

— 

— 

345(2) 

510(15) 1188(8) 

2594(31) 

May 

— 

— 

— 

34(2) 

340(11) 226(6) 

— 


that it is a single-site model and therefore does not preserve the cross correlations among 
different rivers.. A better approach would be to use one of the multi-site models (e.g., the 
MARMA type of models). 

The primary purpose of the simulation is, thus, to prepare ground for more accurate 
and more systematic optimization. The parameters to which the system performance is 
sensitive, their possible ranges and the associated increments by which the parameters 
should be varied in the optimization are all identified by the simulation analysis. Table 2 
gives summary results of one of the simulation runs for a 50-year period. The table shows 
the transfers made through various links for a set of given values of storage zones and 
feasible values of INCR1 and INCR2. 


3. Optimization model 

Within the range identified for a particular parameter, an optimal value of the parameter is 
determined by solving an optimization model. The parameters for which optimal values 
are sought are, 5rel,i and S um. 7 , the releasable storage limits for the monsoon and non¬ 
monsoon seasons respectively, Stra.i ^d Stra,2> the transferable storage limits for the 
two seasons and INCR1 and INCR2, the factors by which the irrigation potential may 
be increased for the two seasons. A variation in any one of these parameters at a critical 
reservoir will affect the entire system. It is therefore necessary to identify the optimum 
values of these parameters for the fixed release and transfer policies discussed earlier. The 
optimization model is formulated as follows: 

Max. ££>(/*-/ID*, - 

k t 


( 3 ) 
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subject to: 

(i) Diversion policy, 

DIV*=4 if, S?, + /*>d* 

= Sf f + /* otherwise. (4) 

(ii) Release policy, (1) 

(iii) Transfer policy, (2) 

(iv) Definition constraints: 


(a) 

Df = dj c — {/*, if positive, 



= 0, otherwise. 

(5) 

(b) 

£/* = DIVf +R* + T t k 

(6) 

(c) 

dj =INCR1 a (DEMj), Vf 6 monsoon season. 



= INCR2*(DEMf), Vf s nonmonsoon season, 

(7) 


(v) Storage continuity, physical constraints and non-negativity of the variables; and 

(vi) Constraints due to priorities discussed in the simulation model. 

In this model, a represents the economic value of the water actually utilized, fi rep¬ 
resents the penalty (loss) associated with not meeting the demands. Both a and are 
complex functions of operational priorities, the purpose for which the water is used, mar¬ 
ket conditions and even the societal preferences. Representing all these factors into single 
economic coefficients is therefore a gross approximation of the economic process. How¬ 
ever, the purpose of the present study being to examine the physical distribution of water 
in the system, this approximation is deemed justifiable. 

For solving the optimization model, the Box complex algorithm (Box 1965) is used. 
The algorithm solves the following general problem, 

Minimise f{x\ , * 2 ,..., *n), * (8) 

subject to constraints of the form gk < xj* < k = 1,..., m, where jt n + 1 • * * x m are 
functions of x\ * • * x n and the lower and upper constraints gk and hk respectively are either 
constants or functions of x\ * • * x n . 

The algorithm is likely to find a lower optimum (for a minimization problem) than other 
similar algorithms if the permissible region contains several local peaks, as it does not 
depend much on the initial point supplied (Box 1965). An IMSL subroutine, BCPOL, 
that minimises a function of n variables subject to bounds on the variables using a direct 
search complex method is used. In this routine, the function to be minimised may be 
given as a user-supplied subroutine. In the present case, the search algorithm determines 
improved values for each of the parameters based on the objective function and direction 
of movement in the previous trials. Corresponding to this new set of parameter values, the 
objective function value is determined through the simulation model. Along with the 30 
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Table 3. Summary results of the optimization. 


Reservoir 

Srel.i 

(Mm 3 ) 

Sr.EL.2 

(Mm 3 ) 

Stra.i 

(Mm 3 ) 

■StRA,2 

(Mm 3 ) 

(1.3) 

1588.34 

2793.47 

2373.59 " 

3873.01 

(1,4)* 

— 

— 

162.91 

1232.78 

(2, 3) 

1134.35 

189.32 

1521.93 

417.15 

(2,4) 

895.74 

1525.69 

685.60 

417.67 


* There is no reservoir downstream of (1, 4). Release not possible. 


INCR parameters, only 14 of the 60 storage parameters are subjected to optimization as 
the system performance is found to be less sensitive to the other storage parameters in the 
initial simulation runs. Optimization is carried out for a period of only one year. In India, 
flows at 75% exceedence probability are generally considered for planning purposes. A 
critical sequence of flows with 75% exceedence probability is therefore used in the model. 
Since only a one-year period is considered for optimization, choice of the initial state 
of the system is a very important exercise; the most likely range for the Initial state of 
each reservoir of the system as obtained from a statistical analysis of the large number 
of simulation results is used for the purpose. The initial values and the range for search 
required as an input by the optimization algorithm for each of the parameters Is also 
obtained from the simulation results. Since the optimal solution is dependent on the initial 
guesses, successive runs are carried out by specifying the optimal values of the parameters 
of one run as the initial guesses for the next run and modifying the ranges accordingly. 
This process is continued till analyses with different initial guesses result in approximately 
the same solution. In the present case, the convergence was achieved within 8 such cycles. 
Table 3 gives the optimal storage zone levels obtained by this procedure with the Inflows at 
75% exceedence probability. These storage zones in conjunction with the various policies 
adopted resulted in a significant increase in the water availability in the Cauvery and Pennar 
basins without affecting the existing demands at the other basins. 


4. Performance evaluation 

The performance of the system operation with the optimal solution is examined over a 
long period by simulation analysis using four performance indices, viz., reliability (p), 
resiliency (y), vulnerability (v) and deficit ratio (<5). 

In defining the reliability and resiliency, the concept of 'Failure index’ suggested by 
Fiering (1982) that incorporates both the frequency and severity of failure is used. Ac¬ 
cording to this, a full failure is the one when even 75% of the demands are not met and 
smaller failures are measured with the expression A//0.257}, where A/ is the deficit and 
Ti is the target; 0 < A//0.257) < 1. The failure index F is calculated as the ratio of the 
sum of all the failures to the total number of periods. The reliability (p) is defined in this 
study as 1 — F. The resiliency (y) is defined as the ratio of the number of transitions from 
a failure state to a satisfactory state and the total number of failures. By incorporating the 
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Table 4. Summary of the yearly 
performance indices for the system. 


Reliability (p) 

0.946 

Resiliency (y) 

0.680 

Vulnerability ( v ) 

0.474 

Deficit ratio (5) 

0.014 



concept of failure index, the improvement in the performance from a larger failure to a 
smaller failure is also accounted for in the definition of resiliency. Thus, these two indices 
(p and y) are defined in a slightly different way as compared to the definitions suggested 
by Hashimoto et al (1982). Vulnerability (t>) is defined as the ratio of the largest deficit 
during the period of operation to the corresponding demand at the reservoir. The deficit 
ratio (5), defined as the ratio of the total deficit to the total demand is used to measure the 
effect of cumulative deficit. The performance of the system should desirably result in high 
reliability and resiliency and low vulnerability and deficit ratio. 

A 500-year simulation is carried out to estimate the performance indices using the 
synthetic streamflow sequences. The indices are estimated for each reservoir and the system 
as a whole in each month and over the whole year. In addition, reliability of two of the 
prominent transfer links, the Polavaram (1,4)- Prakasam Barrage (2,6) and the Srisailam 
(2,3) - Upper Anicut (4,3) links, is also estimated with ‘committed transfers’ as obtained 
from optimization analysis. The results of the performance of the system as a unit over 
the entire period are given in table 4. The results of the monthly performance indices of 
the system and the yearly performance indices of the reservoirs are given in table 5 and 
table 6 respectively. 

It must be noted that the performance values given in these tables correspond to the use of 
the optimal storage zone levels and increased irrigation supplies specified by the solution of 
the optimization model. It is observed that although the reliability of meeting the increased 
demands is quite high, the vulnerability of the system to large deficits is high too, signifying 
that in the few periods where failure occurs, a large deficit may result. It is also important 
to note that none of these criteria are explicitly included in the optimization and therefore 
they are not, in fact, the ‘optimum’ values of the system performance. They are, however, 
indicators of how the system is likely to perform under the ‘optimal’ operation over a long 
period of time, when a sequence of inflows used in the simulation is, in fact, realized. 




5. Conclusions 


In this study, a simulation-optimization approach is used to analyse the complex multi¬ 
basin, multireservoir Peninsular Indian river system. The simulation model of the type pre¬ 
sented in this study is essential in order to build a database of various operating strategies. 
The sensitive parameters selected with a large number of simulation runs are optimized 
over a wide range of alternatives and a solution of the system operation is obtained with 
inflows at a specified probability of exceedence. The performance of the system over a 
long period with this solution is analysed again by a simulation analysis. 



Table 5. Monthly performance indices of the system. 
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It is observed that the performance of the system can be increased to a large extent 
by operating the system as a single unit. If the planning is carried out with inflows at 
75% probability of exceedence the existing irrigation demands can be increased to about 
26%. It is also observed that these demands can be satisfied with a high reliability. The 
study has indicated that two transfer links, the Polavaram to Prakasam Barrage and the 
Srisailam to Upper Anicut through Mylavaram, play a significant role in enhancing the 
water availability in the Cauvery and Pennar basins. 

List of symbols 

DEF}' deficit in reservoir L in period t, 

DEM, existing irrigation demand at reservoir k in period t, 

DIVj diversion from reservoir k in period t to meet its own demand, 

Dj deficit in reservoir k in period t, 

df water demand at reservoir k in period t , 

F failure index, 

natural inflow to reservoir k in period t, 

INCR factor by which irrigation demands are multiplied (> 1.0), 

INCRl* factor by which the monsoon irrigation demands at reservoir k are multiplied 

(> 1 . 0 ), 

INCR2 fc factor by which the nonmonsoon irrigation demands at reservoir k are multiplied 

(> 1 . 0 ), 

j season index; j = 1 for monsoon season and j = 2 for nonmonsoon season,' 

L index for the reservoir to which a release from reservoir M is possible, 

M index for the current reservoir (reservoir from which releases and transfers are 

being computed), 

P index for the reservoir to which a transfer from the reservoir M is possible, 

i?j v5,L release from reservoir M to reservoir L in period t, 

storage at reservoir M in period t after accounting for diversions and releases, 
Sf t storage at the beginning of period t in reservoir k 

S f L storage in reservoir L in period t after accounting for transfers and releases. 

already committed to reservoir L from reservoirs other than the reservoir M, 

Smax maximum storage, 

Smin minimum storage, 

Srel storage level above which a release is allowed 
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Srelj storage level at a reservoir for the season j above which release from the 
reservoir is allowed, 

storage level of reservoir M in season j , above which release from reservoir M 
is allowed, 

Stra storage level above which a transfer is allowed, 

Straj storage level at a reservoir for the season j above which transfer from the 
reservoir is allowed, 

Sj^a j storage level of reservoir M in season j above which transfer is allowed, 
t period index, 

7} target in period /, 

7) M,P amount of water transferred from reservoir M to reservoir P in period t , 

Uf amount of water utilised from reservoir k in period t, 

y resiliency, 

A; deficit in period i, 

8 deficit ratio, 

v vulnerability, 

p reliability. 
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Reservoir operation for hydropower optimization: 
A chance-constrained approach 




K R SREENIVASAN and S VEDULA 

Department of Civil Engineering, Indian Institute of Science, 

Bangalore 560012, India 

e-mail: [krsree,svedula] @civil.iisc.emet.in 

MS received 23 August 1995 

Abstract. This paper presents a chance-constrained linear programming for¬ 
mulation for reservoir operation of a multipurpose reservoir. The release policy 
is defined by a chance constraint that the probability of irrigation release in 
any period equalling or exceeding the irrigation demand is at least equal to a 
specified value P (called reliability level). The model determines the maximum 
annual hydropower produced while meeting the irrigation demand at a specified 
reliability level. The model considers variation in reservoir water level eleva¬ 
tion and also the operating range within which the turbine operates. A linear 
approximation for nonlinear power production function is assumed and the so¬ 
lution obtained within a specified tolerance limit. The inflow into the reservoir 
is considered random. The chance constraint is converted into its deterministic 
equivalent using a linear decision rule and inflow probability distribution. The 
model application is demonstrated through a case study. 

Keywords. Reservoir operation; hydropower; optimization; chance constraint; 
rule curve. 


1. Introduction 

Operation of a multipurpose reservoir for irrigation and hydropower requires a strategy, as 
water is to be used for two conflicting demands. The available storage has to be allocated 
for multiple purposes considering stochasticity of inflow. Hydropower optimization for a 
c multipurpose reservoir for fixed irrigation demand and deterministic supply was studied 

by Sreenivasan & Vedula (1994). Reservoir operation for irrigation at specified level of 
reliability (with inflow treated as a random variable) was studied by Vedula & Sreeni¬ 
vasan (1992). The present study combines the features of these two studies in optimizing 
the hydropower production from a multipurpose reservoir with random inflows and with a 
specified reliability of meeting irrigation demand. Chance-constrained linear programming 
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(CCLP) is used in the model formulation. The model determines (i) the maximum; 
hydropower that can be produced while meeting the irrigation demand at a specified 1 
reliability, and (ii) the end-of-period storages (rule curve) to be maintained in the res 
The release policy is expressed through a chance constraint. The nonlinear hydro 
production function, being a product of the power release and the head acting o\ 
turbine, is linearized using the approximation given in Loucks et al (1981, pp 248—S 
model considers variation in the hydrodynamic head on the turbine consequent to a 
tion in reservoir water level elevation and also the range within which the turbine op 
A reservoir system operated for irrigation and hydropower typically consists of a 
voir with left and right bank canals leading to the irrigated area, and a powerhc 
the bed of the river. The irrigation canals may also have powerhouses on them. Pc 
produced in them through a diversion of the irrigation releases into the canals. Th 
bed turbine produces power only through the release made downstream of the res 
In the present study, a lumped irrigation demand is considered at the reservoir as a i 
requirement (into the canals) for irrigation. 


2. Objective 

The objective of the study is to formulate a mathematical programming model to 
mine (i) the maximum annual hydropower produced from a multipurpose reservoir 
meeting the irrigation demands for various specified levels of reliability, and (ii) rult 
storages for the optimal operation of the reservoir. 


3. The model 

The model is formulated for a single reservoir serving multiple purposes (irrigati 
hydropower). Inflow into the reservoir is considered random. The head on the turbine 
from time to time consequent to a change in the reservoir water level. Also, hydre 
can be produced only when the reservoir water level lies in the operating range sp 
for the reservoir. No power can be produced outside of this range. Chance-cons 
linear programming is used in the formulation. 

3.1 Release policy 

The reservoir release policy is defined by the chance constraint equation (1) below, 
states that the probability (Pr) of irrigation release in time period t equalling or exc 
the irrigation demand, is not less than a specified value P. P is referred to hereafte 
reliability level. 

Pr[IRA t > ID,] > P, 

where IRA, is the irrigation release during period tJD, is the irrigation demand for p 
and P is the specified reliability level. 
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3.2 Reservoir water balance 

The reservoir storage continuity relationship is expressed as 

S t + It ~ IRA, - BP t - EV t = S t + 1, (2) 

where S t is the storage at the beginning of period t, I t is the random inflow into the reservoir, 
IRA t is the irrigation release, BP t is the downstream release (assumed deterministic) for 
bed power production, and EV t is the evaporation loss during period t. EV t is approximated 
(in the range of active storage) by a linear relationship of the form 


EVt — a t + ft(ft + St+i), (3) 

where a, and ft are coefficients depending on the period t . Substituting for the evaporation 
term from (3) into (2), and rearranging we get 

IRA t = (1 - ft)ft - (1 + ft)ft+i + I t - BP t - o f . (4) 

Substituting (4) into the chance constraint equation (1), one gets 


Pr[( 1 + ft)ft+i - (1 - ft)ft + BP t + a t + lD t < l t ] > P. (5) 

This is the final form of the chance constraint. The deterministic equivalent is written 
using a linear decision rule (LDR) as follows. 

3.3 Linear decision rule 

The following linear decision rule is considered, 

IRA t = S t + I t - BP t - EV t - ft (6) 

where ft is a non-random, non-negative operating policy parameter. Equation 6 indicates a 
release equal to the total available quantity, ft + I t — BP, — E V, , less some fixed amount ft. 

Substituting (6) into the reservoir continuity equation (2), the linear storage rule is 
obtained, 

ft+i=ft. (7) 

Employing this rule in (6) it can be seen that variance of inflow is directly transferred to 
the irrigation release, as EV t and BP t are functions of storage, both of which are now 
deterministic. 

3.4 Deterministic equivalent 

Substituting (7) in the chance constraint equation (5), the deterministic equivalent is written 
as 

(1 + ft)ft - (1 - ft)ft_i + BP, +a t + ID t < If l ~ P \ (8) 

where I, l ~ P ^ is the reservoir inflow during period t with probability (1 — P), or exceedance 
probability P. 
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3.5 Other constraints 

The storage in any time period t shall not be less than 
not exceed the total capacity of the reservoir (K). 

(9) 
( 10 ) 

bo = bn- 


3.5a Storage capacity constraints: 
dead storage capacity (Kd) and shall 

bt-\ > Kd, 
b t -i < K, 

with 


3,5b Power plant capacity: The energy produced by the bed turbine in any time period 
t, EB t , shall not exceed that corresponding to the installed capacity of the turbine, BPC, 
thus 

EB t < BPC. (11) 

3.5c Head-storage relationship: For computing the head over the turbine, the reservoir 
elevation H t , in any period t above the river bed is taken to be the average of the elevations 
at the beginning and end of the period. The following linear relationship is assumed within 
the range of storages defined by (9) and (10). 

H t = y[(b t - 1 +b t )/2) + S, (12) 

where y is the slope of the linear portion of the elevation-storage curve, and 8 is the 
intercept. 

3.5d Linear approximation for power production function: A linear approximation of 
the nonlinear power production term, EB l = c{BP t (H t - BTAIL)} is expressed, following 
Sreenivasan & Vedula (1994), as 

EB t = c[BP t {H? - BTAIL ) + BP Q t (H t - BTAIL ) - SP°(// f ° - BTAIL)] 

(13) 

where BP® is the approximate value for the bed power release BP t , in period t, and H® 
is the approximate value for the reservoir elevation H t in period t, BTAIL is the tail race 
elevation of the bed turbine above river bed, and c is a constant to convert the product of 
the rate of flow and head over the turbine to the energy produced from the turbine in period 
t. This constraint will be active only when the reservoir elevation is within the operating 
range (//min < H t < H max ) for bed turbine operation, H m j n and // max being specified. 
EB t is set to zero outside this range. 

3.6 Objective function 

The objective is to maximize the annual hydropower production by the bed turbine. 

Maximize EB t . (14) 

i 

The objective function (14) along with constraints, (8) through (13), constitutes the 
chance-constrained linear programming formulation. 
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4. Methodology 

The CCLP model is run for a specified value of P (reliability of meeting irrigation demand). 
Initially, the solution is obtained by assuming some reasonable values H® and BP®. If 
the values of H t and BP t in the solution are different from these, then a second run is 
made replacing H® and BP® by H t and BP t respectively. Thus the CCLP model is run 
successively each time replacing the values of BP® by BP t and H® by H t till convergence 
is obtained as explained in the paper by Sreenivasan & Vedula (1994). The optimal solution 
is assumed to have converged when H t ~ H® and BP t ~ BP®, within a specified tolerance 
limit (set as 10~ 3 in the present study). In a similar manner, the model is run for different 
specified levels of reliability, each time with a reliability higher than in the earlier run, 
till the solution becomes infeasible. The highest possible reliability level of meeting the 
irrigation demands associated with the given inflow series and the reservoir capacity is 
thus determined. 

5. Application 

The model is applied to a reservoir system in South India. The total capacity of the reservoir 
is 2024Mm 3 with a dead storage capacity of 240Mm 3 . For the model application in the 
present study, it is conceptualized that the reservoir releases water to a composite command 
comprising the left and right bank command areas through a single irrigation canal. This 
is done because of the model limitation in its inability to deal with two different random 
releases (into left and right bank canals) both of which have the same source of supply 
(reservoir). The model is applied using the month as the time period. 

The installed capacity of the bed turbine is 24 000 kW. The value of c = 0.002268 in 
(13), where EB t is in 10 6 kilowatt hours, BP t and H t are expressed in 10 6 cubic metres and 
metres respectively. The value of BPC (the energy corresponding to the installed capacity 
of the riverbed turbine) works out to 10.87 million kilowatt hours per month (for a standard 
month of 730.5 hours with an assumed load factor of 0.62). 

The total hydropower produced consists of the power produced by the canal powerhouses 
and the riverbed powerhouse. The canal powerhouse production depends on the irrigation 
release, which is decided by the reliability criterion. The power produced from the canal 
powerhouse is therefore incidental to this release. Because of this, only the riverbed-turbine 
power production is optimized in the present study. Hydropower production, henceforth 
referred to in the paper means only bed turbine power production. 

5.1 Data 

Analysis of available data revealed that y = 0.0135 and 8 = 30.6 in (12) from elevation- 
storage curve of the reservoir (elevation in metres and storage in 10 6 cubic metres). Monthly 
inflow data of 52 years (from 1930-31 to 1981-82) were used to prepare the appropriate 
inflow sequence to be used in a given run. Inflows in each month are ranked and the 
appropriate values at a specified probability level determined using the Weibull formula. 
Table 1, for example, gives monthly inflows with P = 0.65, along with the irrigation 
demands. 
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Table 1. Inflows at P — 0.65 and 
irrigation demands. 


Month 

Inflow 
(P = 0.65) 
(Mm 3 ) 

Irrigation 

demand 

(Mm 3 ) 

Jun 

163.40 

119.90 

Jul 

813.20 

136.80 

Aug 

702.97 

200.60 

Sep 

261.73 

195.80 

Oct 

202.81 

203.20 

Nov 

89.31 

189.70 

Dec 

50.52 

109.40 

Jan 

26.93 

137.30 

Feb 

17.10 

180.10 

Mar 

10.64 

197.39 

Apr 

11.70 

197.90 

May . 

11.06 

178.60 


In (3) values of a = 7.388 and ft = 0.003 were used in place of a t and ft respective 
for all t, for the reservoir under study as given by Vedula et al (1986). The minimi 
and maximum reservoir elevations for the bed turbine operation are specified as H m i n 
36.88 m and H max = 56.693 m above the riverbed. 

6. Results and discussion 

Model runs were made for different reliability levels in increasing order in steps of 0. 
The maximum possible reliability for meeting the irrigation demand is found to be 0.65 i 
the associated maximum annual hydroenergy produced by the bed turbine is 5.68 M kv 
Table 2 shows the maximized annual hydropower produced by the bed turbine for vari< 
reliability levels of meeting the irrigation requirement. Figure 1 presents a plot of reliabi 
vs hydropower produced. From the curve one can find the maximum annual energy t 
can be produced by the bed turbine for a specified reliability level of meeting the irrigat 
demands; or alternatively, one may find the maximum reliability of irrigation associa 
with a given level of hydropower production. 

Table 3 shows model results for a reliability level P of 0.65. The table gives the den 
stream release for the bed turbine, reservoir water elevation, hydropower produced by 


Table 2. Reliability and annual bed turbine power production. 


Reliability* 0.50 0.55 0.60 0.65 

Energy produced (M kwh) 56.17 40.10 17.40 5.68 


Of meeting irrigation demand 
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M kwh 



0-50 0*55 060 0-65 , 

reliability level Figure 1. Annual energy produced vs 

reliability. 

bed turbine and end-of-period storage in each period. The end-of-period storages define 
the rule curve for reservoir operation. Figure 2 presents the rule curve for optimal operation 
of the reservoir for P = 0.65. 

7. Conclusions 

A chance-constrained linear programming model is formulated for a multipurpose reser¬ 
voir to determine the maximum annual hydropower production by the bed turbine, while 
meeting the irrigation requirements at a specified level of reliability for given reservoir 
capacity. The model considers (i) a linear approximation for the nonlinear power produc¬ 
tion function, (ii) variations in the head over the turbine in each period, and (iii) operating 
range of reservoir water levels for bed turbine power production. 

The model can be used to determine (i) the maximum annual hydropower that can be 
produced at different levels of reliability (of meeting the irrigation demands), and (ii) 


Table 3. Model results. 


Period Power 
release 
(Mm 3 ) 

Reservoir 

elevation 

(m) 

Energy 
produced 
(M kw h) 

End-of-period 

storage 

(Mm 3 ) 

1 

0.000 

33.290 

0.000 

266.228 

2 

0.000 

37.731 

0.000 

923.306 

3 

0.000 

45.120 

0.000 

1402.948 

4 

60.454 

48.117 

5.678 

1384.308 

5 

0.000 

47.837 

0.000 

1359.935 

6 

0.000 

46.873 

0.000 

1236.000 

7 

0.000 

45.536 

0.000 

1154.201 

8 

0.000 

44.142 

0.000 

1021.551 

9 

0.000 

42.082 

0.000 

837.220 

10 

0.000 

39.539 

0.000 

630.409 

11 

0.000 

36.862 

0.000 

425.288 

12 

0.000 

34.324 

0.000 

240.000 
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Mm 3 



Figure 2. End-of-period storage for 
P = 0.65. 


the optimal end-of-period storages (rule curve storages) for reservoir operation. Also, the 
maximum reliability of irrigation associated with a given level of hydropower production 
can be determined. 
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Mathematical formulation for estimation of baseline in 
synthetic aperture radar interferometry 
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Abstract. Terrain height estimation through spaceborne interferometric syn¬ 
thetic aperture radar (INSAR) requires accurate knowledge of the orbital shift 
between repeat passes. Mathematical models are available for the estimation 
of horizontal orbital shift. However, in reality, the orbital shift between repeat 
passes is modelled as two-dimensional for the same azimuth scanline. In this 
paper, a new mathematical formulation has been developed for the estimation 
of the two-dimensional orbital shift of INSAR based on the fringe line pattern 
in the interferogram of flat earth. 

Keywords. Interferometric synthetic aperture radar; baseline estimation; hor¬ 
izontal orbital shift; fringe line pattern. 


1. Introduction 

Topographic maps of the terrain surface have found numerous uses in the areas of geol¬ 
ogy, natural resources, land use management, hydrology, remote sensing etc. In general, 
topographic maps can be generated by using stereo pair optical photographs or stereo pair 
radar imagery in both of which the resolution depends on the ground cell size. In both air¬ 
borne or spaceborne Interferometric Synthetic Aperture Radar (INSAR), two SAR phase 
imageries of the same scene are combined coherently to form a phase interferogram which 
can be used to derive the information of the terrain elevation (Zebker & Goldstein 1986). In 
the airborne INSAR system, two antennae are used simultaneously to receive the signals, 
whereas, in the repeat pass orbit INSAR, a single antenna is used in the satellite or space- 
borne system and two images are taken for the same scene during different orbital passes 
and then combined (Zebker & Goldstein 1986; Madsen et al 1993). The phase difference 
between the two passes contains information about the ground elevation, which varies from 
0 to 2n due to rotation of the phase vector. The performance of the radar interferometer de¬ 
pends upon system parameters such as frequency, resolution, orbital parameters (baseline 
vector) and errors introduced during data processing and post-processing operations due 
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to signal-to-noise ratio, number of looks, pixel misregistration, baseline decorrelation ar 
phase aliasing (Li & Goldstein 1990; Lin et al 1991,1992; Hagberg & Ulander 1993). 1 
repeat pass interferometry, significant error results from inaccuracies in the knowledge < 
INSAR orbital shift (also called the baseline), used in the mathematical models availab 
(Zebker & Goldstein 1986; Lin et al 1991) to derive a topographic map. For each pix 
corresponding to a given point of the area in both images, the phase difference value giv< 
the measure of difference in path length from a given pixel to each antenna of the SA 
interferometer. Using the knowledge of orbital parameters and the phase difference inte 
ferogram, digital terrain elevation can be directly related to the altitude on a pixel-by-pix 
basis. 

For a flat earth, the phase difference value at a point increases from zero to 2 n, th< 
drops to zero again, forming a saw-tooth pattern in the range direction (Lin et al 1991 
The average fluctuations in the adjacent pixels are very small. The sharp transition 
phase difference value from 2 it to zero is called a fringe line in the phase interferogra 
(Lin et al 1991). The two complex images are combined to form a phase difference ima, 
called an interferogram. For a flat earth, many fringe lines of a definite pattern in t 
interferogram give precise information of baseline vectors or orbital parameters durii 
repeat passes (Lin et al 1991,1992). Before applying the mathematical models, the pha 
interferogram image is unwrapped by adding 2 n wherever fringe lines occur. For t 
two images to be properly registered Ay shift of the satellite repeat pass orbit is ze 
and Ax, Az shifts are present to form the baseline vector. The variation in the frin 
pattern contains the information about surface topography. However, in the presence 
phase noise, it becomes difficult to decide where the transition of phase difference val 
from 2n to 0 occurs. Different phase-unwrapping techniques are used to reduce ph£ 
noise prior to the estimation of terrain elevation (Lin et al 1992; Madsen et al 1993). F 
any terrain, the phase difference value varies randomly between the fringe lines. Lin et 
(1991) have derived a mathematical formulation to estimate the one-dimensional horizon 
shift by knowing the distance between fringe lines. In reality, however, this orbital sb 
is two-dimensional for the same azimuth scanline. In this paper, a new mathematii 
formulation is presented based on the detection of three consecutive fringe lines on f 
earth to accurately estimate the two-dimensional orbital shift of INSAR. It is observ 
through modelling that the consecutive fringe lines are formed at such a small distar 
that the ellipticity of the earth has no bearing on the results and that, in the absence 
any topography, the earth can be assumed to be flat between these consecutive frin 
fines. 


2. Mathematical formulation 

In INSAR, the complex images (Lin et al 1992) of the same region or area are taken fr< 
two orbits successively. A and B are the positions of the satellite in two successive orb 
Ax and Az are the horizontal and vertical shifts of position B with respect to position 
as shown in figure 1. The satellite-borne synthetic aperture radar (SAR) operates h met 
above the ground and it looks to the side with an incidence angle of a degrees. P anc 
are the points of interest which are z\ and Z2 metres above the ground reference. Ah is 
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Figure 1. Geometry of interferomet¬ 
ric SAR. 


relative height between P and Q. A ground point P, x\ metres away from the nadir, is at a 
distance p\ from the first orbit position A and pi from the second orbit. Hence, the path 
length difference at ground position P can be written as 

P = Pi - P2 = AP - BP. (1) 

Similarly, the path difference p' at ground position Q can be written as 

p' = AQ - BQ. (2) 

Hence, 


p = AP — BP 

= {{h - zx) 2 + x 2 } 112 - {(h -zy'+ Az) 2 + (xx - Ajc) 2 } 1/2 
= {h 2 + z\-lhzx +x\} 112 

— {h 2 + z 2 — 2hzx + A z 2 + 2hAz — 2z\Az + x 2 + A* 2 — 2xx Ajc} 1//2 . 

(3) 


As shown in figure 1: h 2 + x 2 — d 2 . 
Hence, (3) can be reduced to 


p = dx 



2fe,V /2 

A ) 



2hzx. Az 2 2hAz 

~df + lf + ~df 


2zxAz 


+ 



2xi Aa:\ 1/2 

A ) ' 

(4) 


Let us assume. 
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and 


Since 


I* 

«2 = —y{hAz — ZiAz — *1 Ax}. 
d x 


Az 2 /d 2 <3C 1 and Ax 2 /d 2 1, 


substituting the value of u\ and 112 , (4) reduces to 

p — di{(l + mi) 1/2 - (1 + mi + M2) 1/2 }. 

Using the approximation (1 + m) 1 / 2 = 1 + m/2 — m 2 /8, where u < 1, (5) reduces to 


A \ M2 , “2 

= il i~T + T 


+ 


U\U2 


Since u\f 8 < 1, (6) becomes 


P = 


d\U2 


(-t)' 


( 6 ) 


(7) 


Substituting the values of u\ and U 2 , (7) reduces to 
(hAz — ZiAz — jci Ax) 


P = — 


+ 


4 


di 

hAz z 2 


.2 A Z 1 AZ 

h z\Az -— 


+ /tz 2 Az- 1 — + x\hz\Ax 


( 8 ) 


Since, 


hz\Az 

2d 2 


« 1, 


z?Az , ftz?Az 
1 < 1, —«: 1 and 


zdj 




xizyAx 

2d 2 


«1. 


Hence, neglecting these terms, (8) reduces to 

h — zi 


p = Ax 


xi /ixizi 

di d 2 


- Az 


di 


+ 


/l 2 Zl 

Tj 


(9) 


Equation (9) estimates the path difference (p) for a point P on the ground reference as¬ 
suming z\ <5C h and xi Ax, Az shift of satellite position. In the case of the formulation 
by Lin et al (1991,1992) the perpendicular baseline between two parallel orbits Ax = B 1 
is taken assuming Az = 0. Applying this concept of Lin et al (1991,1992), (9) reduces to 
Lin’s formula and can be written as 


X1B1 

di 


+ 


xihz\B\ 

4 


(IO) 
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Similarly, taking only vertical shift Az = B 2 and Ax 
(fi-zi)B 2 fi 2 ZiB 2 


0. 




d\ 


4 


( 11 ) 


Equation (9) is of a general form for evaluation of path difference at a point for 
any combination of satellite shifts (horizontal or vertical) during the repeat pass 
orbit. 

Similarly, the path difference at a point Q on the ground reference can be derived and 
written as 


p' = Ax 


X2 hx2Z2 
d2 d\ j 


Az 


(h - Z2) h 2 z 2 


d 2 


4 J 


( 12 ) 


The expression for path difference p and p' can also be written as 

h :ci Ax h 2 Az\ 


1 z\ 

p — — (xi Ax - hAz) 4- — 
d\ d\ 


A z + ■ 


d 2 


4 


— (X2AX 

d2 


Z2 


hAz) + — I Az + 


hxoAx h 2 Az\ 

If)' 


4 


(13) 


In (9), the first term is independent of z and corresponds to path difference for flat earth. 
The path difference is proportional to Ax and Az, and is a nonlinear function of xi or x 2 . 
since d\ or dz is a nonlinear function of x. The second term in (9) is due to terrain elevation 
zi or Z2- The phase associated with the path difference between any two points P(xi, zi) 
and Q(x 2 , z 2 ) i n the image can be determined by analysing the interferogram formed from 
two images. It can be written as 


where 




A p = Ax 


, 4zr 
P) = — Ap, 


(14) 


X 2 fix 2Z2 

- Az 

h - Z2 

h 2 Z2 

dl d\ 

d 2 

4 . 


Ax 


Xi hxizi 
d x + d 3 


1 J 


+ Az 


h — zi d 2 z 1 


di 


4 J 


Hence, (14) can be written as 
An 


A<p = • 


+ 


An 


(I 


( 15 ) 
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The first term of the right hand side of (15) is independent of z\ and z 2 , which is ti 
phase difference associated with the fiat earth surface and can be termed as S(po, while tl 
second term, dependent on z\ and Z 2 , contributes to the terrain elevation and is termed; 
Sep'. 

Therefore, Sep can be written as 


Sep = Scpo + Sep'. 


(1 


For a flat earth area in the image, Sep' = 0. Selecting two consecutive fringe points in tl 
flat earth interferogram, Sep o = In. In this case, (15) reduces to 


Ax I — 


*i 


<h d\ 


-T- + Azh — 


1 


di 


1 

d 2 


A 

2 ' 


(1 


Ax and A z can be evaluated, assuming under one condition that A z is zero and under t 
second that Ax = 0. 


Case i: When A z — 0, (17) reduces to 
. X.d\d^> 

Ax — -—- n 

2(x 2 d { - Xl d 2 )' K 

This is the same expression derived by Lin et al (1991). In this case, vertical shift of t 
satellite is assumed to be zero. 


Case ii: When Ax = 0, (17) can be reduced as 
Xd\d 2 

z ~2h{d 2 -d x y (1 

Hence, with prior knowledge of the type of repeat orbit of the satellite, the precise vaJ 
of the horizontal or vertical baseline vector can be evaluated using the distance betwe 
two consecutive fringes of the phase interferogram of the flat earth. However, in reali 
satellite shift is usually two-dimensional with both Ax and Az components. To calcuL 
Ax and A z, three consecutive fringe points of the phase interferogram of the flat earth c 
be used to develop mathematical formulation. For this Q and R points (figure 1) can 
selected as fringe points for the flat earth where Scfio = 2it and in analogy with (17), i 
following equation can be derived as 


Ax 


*3 

_d3 


Now, (17) and (20) 


X2 

d2. 


+ hAz 


1 

L d~2 


d~3- 


can be written in the form 


2 * 


c 


AxAi + A^Bi = c, 
AXA 2 + A 4 B 2 = c, 

a X 2 X! 


where. 


x 3 x 2 
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Figure 2. Variation of baseline with distance between fringe lines. 


Equation (21) can be solved for Ax and A z and can be written as: 


c(B 2 -B,) 




A1B2 — A2B1 


c(A 2 — Aj) 
Bi A2 — B2A1 


( 22 ) 

(23) 


By substituting the value of coefficients and simplifying (22) and (23), it can be shown 
that 



2d\dj — d\di — 

x\(d 2 - di) + x 2 (di, - d\) +x 2 (d\ - do) 


(24) 


A A. f xuhdj — 2x 2 t/i^3 + xyd\di | 

z 2/t Ui(^3 - df) +X2(d\ - df) +Xi(d2 - d\) }' 

Equations (24) and (25) give estimates of the two-dimensional orbital shifts (Ax and 
A z) for SAR interferometry with the knowledge of three consecutive fringe points in the 
interferogram of the flat earth. 


3. Results and discussions 

Figure 2 explains the variation of estimated horizontal or vertical baseline with the increase 
of the distance between two consecutive fringe lines of the phase interferogram for 5.3 GHz 
INSAR (ERS-1 C-band SAR) at a distance of 300 kilometres from the nadir (xi) using 
(18) and (19). It is observed that with increase of the distance between fringe lines of the 
flat earth, horizontal or vertical distance of the satellite in the repeat orbit decreases. The 
baseline vector also depends on the frequency of operation of the INSAR. It means that 
by knowing the fringe-line distance, either vertical shift or horizontal shift of the satellite 
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m 



Figure 3. Variation of orbital shift for ERS C-band INSAR (h = 790 km, x\ = 300 km). 

repeat orbit can be estimated precisely and this used to develop the topographic map of 
the earth. 

Figures 3 and 4 depict variation of orbital shift (Ax and Az) with distance between 
two consecutive fringe lines for C-band ERS INSAR at distances from the nadir point of 
300 km and 400 km respectively. It is observed that Ax and A z orbital shifts decrease very 
fast with increase of distance between two consecutive fringe lines (AFL) up to 150 m. 
Beyond this AFL, orbital shifts decrease slowly. It is also observed from both figures that 
vertical orbital shift (Az) and horizontal orbital shift (Ax) are different for x\ = 300 km 
and x\ = 400 km at any value of AFL. In this discussion, equal spacing of fringe lines is 
assumed i.e., AFL = AFLi = AFL 2 , where AFLi is the distance between the first two 
fringe points and AFL 2 is the distance between the second and third fringe points. 


m 



Figure 4. Variation of orbital shift for ERS C-band INSAR (h = 790 km, xj = 400 km). 
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Figure 5. Variation of orbital shifts with distance from nadir point for three equally 
spaced consecutive fringe lines. 

Figure 5 illustrates the variation of orbital shift Ax and A z with distance from the 
nadir point (xi) for AFL = 60 m and AFL = 100m of the interferogram of the flat 
earth surface of C-band ERS INSAR. It is observed that the horizontal shift decreases and 
the vertical shift increases with distance from the nadir point for both AFL. Ax and Az 
variations for a particular AFL form a divergence pattern as distance from the nadir point 
increases. Divergence characteristics of the variation of Ax and Az orbital shifts decrease 
with increase of AFL values for the interferogram of flat earth surface. 

Figure 6 explains the variation of the orbital shifts (Ax and Az) with relative change 
in (AFLi, AFL 2 ) for a point at a distance of 300 km from the nadir of the interferogram 
of flat earth surface. It is found that orbital shifts are very sensitive to the values of AFL] 



199-95 199-98 200-01 200-04 200-07 m 

distance between consecutive fringe lines 

Figure 6. Effect on orbital orientation due to variation in the distance between three 
consecutive fringe lines. 
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m 



Figure 7. Variation of orbital shifts for SEASATINSAR (x\ = 350 km). 


and AFL 2 . In this figure, it is assumed that the distance between the second and t; 
consecutive fringe lines (AFL 2 ) is 200 m and the distance between the first and sec 
fringe line (AFLi) varies from 199.95 to 200.1 m. It is found that Ax decreases and 
increases very fast with slow increase of AFLi while keeping AFL 2 constant. Hence, 
resultant orbital vector [AFL = (AFL| + AFL|)^ 2 ] increases with increase in the v 
of AFLi or the angle of the resultant vector with the horizontal vector. Therefore the e 
knowledge of position of three consecutive fringe lines in the interferogram of the flat e 
surface is essential for prediction of precise satelite-orbital shift in repeat pass INS A1 
Figure 7 explains the variation of orbital shift with increase in the distance between 
fringe lines (AFL) for L-band SEASAT SAR at a distance x\ = 350 km. It is obse: 


m 



distance from nadir point 

Figure 8. Variation of orbital shifts with distance from nadir point for three equally 
spaced consecutive fringe lines. 
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Figure 9. Effect on orbital orientation due to variation in the distance between three 
consecutive fringe lines. 

that Ax and A z decrease rapidly up to AFL = (400 m, 400 m) and beyond this value, 
decrease slowly. Orbital shift depends upon AFL, x\ and frequency. It is also found from 
figure 4 that for the same value of orbital shift, the required distance between consecutive 
fringe lines (AFL) is less for C-band INSAR as compared to L-band SEASAT INSAR. 

Figure 8 illustrates the variation of orbital shift with x\ for AFL = (200 m, 200 m) and 
AFL = (500 m, 500 m) at L-band SEASAT INSAR. In this the vertical shift increases and 
the horizontal shift decreases with increase of fringe point distance from the nadir (xi). It 
is also observed that the slope of increase or decrease of A z and Ax decreases with the 
increase of AFL from (200 m, 200 m) to (500 m, 500 m). In general, at any point of fringe 
location (xj), orbital shift is higher for AFL = (200 m, 200 m) compared to AFL (500 m, 
4 500m). 

Figure 9 depicts the effect of (AFLi, AFL 2 ) on orbital shifts in the interferogram of 
plane earth of SEASAT INSAR. In this case, AFL 2 (2000 m) is taken to be constant 
and AFL] varies in relation to AFL 2 from 1996 m to 2008 m. It is observed that due to 
little change in the value of AFL 1 compared to FL 2 , orbital shifts (Ax and Az) change 
drastically in orientation. It is found that Ax decreases and Az increases with change in 
AFLj relative to AFL 2 . Hence, knowledge of the exact distance between three consecutive 
fringe lines of the interferogram of flat earth is essential to obtain the precise value of Ax 
and Az. Hence, the resultant orbital vector increases with increase in the value of AFLi 
as compared to that of AFL 2 . 


4. Conclusion 

In this paper mathematical formulations have been developed for evaluation of one¬ 
dimensional (horizontal or vertical) and two-dimensional orbital shifts of repeat pass 
INSAR based on the knowledge of two or three consecutive fringe lines in the inter¬ 
ferogram of flat earth. The precise knowledge of orbital shifts (Ax or Az) or (Ax and Az) 
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is required for evaluation of terrain mapping through INSAR. It is found that knowing 
the distance between three consecutive fringe lines is very useful for obtaining the exact 
orientation of the orbital shift, which depends upon frequency, distance of the fringe-line 
location from the nadir point and distance between two fringe lines (AFL). Orbital orienta¬ 
tion plays an important role in the mathematical formulation used for evaluation of terrain 
elevation map using repeat pass INSAR which will be different for one- or two-dimensional 
orbital shifts. 


The authors are grateful to Shri V P Sandlas, for constant encouragement and fruitful 
discussions. Thanks are also due to Dr K K Jha for critical comments and review. 
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Computational structural mechanics 


Foreword 

Advances in computer science and technology have had a profound influence on struc¬ 
tural engineering. A new discipline called computational structural mechanics (CSM) has 
emerged and a huge software industry has grown along with it. CSM has virtually de¬ 
veloped out of a technique called the finite element method (FEM). Powerful general 
purpose FEM packages in the Computer-Aided-Design/Computer-Aided-Manufacturing 
cycle automate the use of structural analysis techniques to check designs quickly for safety, 
integrity, reliability and economy. Very large structural calculations can be performed to 
account for complex geometry, loading history and material behaviour. Such calculations 
are now routinely performed in aerospace, automotive, civil and mechanical engineering, 
oil and nuclear industries. 

This special issue is dedicated to recent developments in finite element methodology 
leading to the design, validation and use of sophisticated software algorithms to generate 
structural information, and to assemble and solve them, maintaining careful book-keeping 
of all data throughout, and subsequently to process the results in a manner helpful to the 
designer for decision making. 

Most finite elements in use today in general purpose packages are based on the minimum 
total potential principle (displacement elements). In recent years, multifield variational 
principles have been seen to provide a broader conceptual framework to interpret the 
method. The first paper “Finite element analysis and the stress correspondence paradigm” 
deals with the paradigm change that follows when the Hu-Washizu basis is used instead of 
the minimum total potential basis for the interpretation of how the finite element method 
works. 

Considerable progress has taken place in developing computational techniques for non¬ 
linear static and postbuckling problems and its application to damage assessment in com¬ 
posite structures. The second paper by B P Naganarayana and S N Atluri on computational 
modelling and analysis of interactive buckling and delamination growth in composite struc¬ 
tures reviews this subject. 

An important design driver in aerospace applications is the behaviour of structural 
parts under cyclic loading. The third paper by B Dattaguru, on the role of elasto-plastic 
analysis under cyclic loading in fatigue crack growth analysis, shows how this is performed. 
The fourth paper, by Dipak K Maiti and P K Sinha, on “Low velocity impact analysis 
of composite sandwich shells using higher-order shear deformation theories” addresses 
another topic of design interest. 
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Foreword 


The modelling of complex structures using pre-processing and high resolution, high 
throughput graphic devices and integration of analysis programs into CAD/CAM systems 
requires automated mesh generation and subsequent regeneration - how this can be done 
adaptively is discussed by C S Krishnamoorthy and S Mukherjee in their paper titled 
“Adaptive finite element analysis with quadrilateral elements using a new ^-refinement 
strategy.” 

Although India has made useful contributions in areas related to element technology, i.e. 
the design of mechanics algorithms for a wide range of structural and thermomechanical 
problems, not much has been done in the way of integration of these mechanics algorithms 
and solution capabilities with pre- and post-processing packages into general purpose 
packages. These large packages are now commonplace in a wide spectrum of engineering 
activity in the country but, sadly, most of this software is acquired from abroad. There has 
not been a serious commitment to the design and maintenance of a package from within 
the country. A medium-sized package for structural analysis, named FEPACS, for analysis 
of composite structures is being developed by the National Aerospace Laboratories at 
Bangalore. The last paper by B P Naganarayana, Gangan Prathap and B R Somashekar, 
titled “FEPACS: A computational tool for linear structural analysis” shows how the C- 
concepts are used as the basis for developing a library of elements and discusses the steps 
taken to enhance the general purpose capabilities of the FEPACS package. 
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Finite element analysis and the stress correspondence 
paradigm 

GANGAN PRATHAP 

National Aerospace Laboratories, Bangalore 560017, India, and 

Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560 064, 

India 

Abstract. The underlying mechanics of the finite element method as applied 
to structural analysis is explored in paradigmatic terms. It is shown that the 
stress correspondence paradigm has the most explanatory power and that it can 
be axiomatized from a very basic principle, the Hu-Washizu theorem, which is 
a variation of the least action principle. Numerical experiments are presented 
to show that the predictions based on analytical quantification from the stress 
correspondence paradigm are verifiable. 

Keywords. Finite element analysis; stress correspondence paradigm; dis¬ 
placement correspondence paradigm; structural analysis; axiomatization; phi¬ 
losophy of science. 


1. Introduction 

Finite Element Analysis (FEA) is a remarkable technological and commercial success 
originating out of collective acts of human ingenuity, skill and craft. A hundred thousand 
or more engineers, technicians, teachers and students routinely use finite element analysis 
packages (of which there are nearly 1500 codes ranging from small dedicated in-house pro¬ 
grammes to large general purpose mega-line codes) in design, analysis, teaching or study 
environments. There are billions of dollars worth of installed software and hardware dedi¬ 
cated to finite element analysis all over the world and perhaps billions of dollars are spent 
on analysis costs using this software every year. The primary archival literature has grown 
rapidly and at the last count there were more than 50,000 papers on the subject (excluding 
papers on fluid mechanics), and nearly 3800 papers on it are published annually (Mackerle 
1995). There are about 400 textbooks and primers, about 400 conference proceedings and 
perhaps thousands of handbooks, course notes and documentation manuals. 

The Finite Element Method (FEM) offers an excellent example of how a body of knowl¬ 
edge first emerges out of the ingenious art and craft of practising engineers, then takes 
shape as modes and lines of rational enquiry are set up, and then is finally shown to have 
a scientific basis. FEM is now formally over thirty-five years old (the terminology ‘finite 
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element’ having being coined in 1960). As it is understood both by its own practit 
and by all laymen who are, even if only remotely, aware of its scope and potential, F 
an approximate method of solving problems that arise in science and engineering. I 
it originated and grew as such, by intuition and inspired guess, by hard work and tri 
error. Its origins can be traced to aeronautical and civil engineering practice, mainlj 
the point of view of structural engineering. Today, it can be used, with clever varit 
to solve a wide variety of problems in science and engineering. 

However, my experience with students and teachers and technicians and engineer 
two decades of interaction is that many if not most are oblivious to the basic prin 
that drive the method. All of them can understand the ‘first-order’ tradition of F 
what goes into the packages; what comes out; how to interpret results and so on. Bi 
could put a finger on why the method does what it does; this ‘second-order’ traditi 
borrow a phrase from the exquisitely crafted philosophy of Sir Karl Popper (Magee 
of critically discussing the myths and the metaphysics of the method is available to 
few. 

To understand why this is so, let us briefly review the stages through which the disc 
grew. The earliest and largely technological stages are what we can call the ‘ham 
and ‘handle-turning’ stages of the enterprise - the design, use and re-design of algoi 
and software on a trial and error basis (hands-on experience) and the drudgery-fillec 
putation (handle-turning) phase of validation and finally production run analysis. B 
gave way to Science very slowly and very reluctantly. This sadly neglected and u 
third stage to the whole exercise, the ‘hand-waving’ stage as one may call it, is 
the myths and superstitions of the method are created and then resolved as Scienc 
science is myth-making just as religion is, said Karl Popper. But scientific myths are < 
ent because one adopts a critical, argumentative attitude to these myths so that the 
“change, [changing] in the direction of giving a better and better account of the \ 
(Karl Popper). In this paper, we shall explore one such ‘hand-waving’ aspect of FI 
learn how the method works, from the point of view of steady progression of myth to 
myth. 

There are very good reasons why the emergence of the Science of FEM was ir 
and uncertain steps. It may serve us well to realize that the finite element metht 
progressed as far as it did precisely because there was more ‘art’ and ‘engineerm; 
little mathematical rigour and less ‘science’ in the early years of its developmen 
invention of the method by engineers in very intuitive ways was the heroic phase 
subject, led entirely by bold pioneers. As Robert M Pirsig described it so graphic; 
his Zen and the art of motorcycle maintenance'. “Pioneers [are] invariably, by their t 
mess makers. They go forging ahead, seeing only their noble, distant goal, and never 
any of the crud and debris they leave behind. Someone else gets to clean that up.” No 
the noble, distant goal has been fully realized, it’s the right time to clean up. This 
be the main task of this paper. 

The particular issue that we shall grapple with here is: What does the finite el 
method set out to do in structural analysis? Does it first compute displacements in a 
structural domain and then derive stresses from these, as is the most common! 
view? I call this the displacement correspondence (DC) paradigm. Or does it intrin: 
sample stresses first and produce displacement fields only in a secondary mannei 


Finite element analysis and the stress correspondence paradigm 


527 


stress correspondence (SC) paradigm that is being proposed here. The complete burden 
of proof, as it is carried by rational argument, is presented here to demonstrate that one 
paradigm has more explanatory power than the other. 


2. Paradigms, some approximate solutions, and derivation from a basic principle 
2.1 Introduction 

A continuum problem in structural or solid mechanics can either be described by a set 
of partial differential equations and boundary conditions or as a functional n based on 
the energy principle whose extremum describes the equilibrium state of the problem. A 
continuum has infinitely many material points and therefore has infinitely many degrees 
of freedom. Thus, a solution is complete only if analytical functions can be found for 
the displacement and stress fields which describe these states exactly everywhere in the 
domain of the problem. It is not difficult to imagine that such solutions can be found only 
for a few problems. 

We also know that the Rayleigh-Ritz (RR) and finite element (FEM) approaches of¬ 
fer ways in which approximate solutions can be achieved without the need to solve the 
differential equations or boundary conditions exactly. This is managed by performing the 
discretization operation directly on the functional. Thus, a real problem with an infinitely 
large number of degrees of freedom is replaced with a computational model having a finite 
number of degrees of freedom. In the RR procedure, the solution is approximated by using 
a finite number of admissible functions /; and a finite number of degrees of freedom a,- 
so that the approximate displacement field is represented by a linear combination of these 
functions using the unknown constants. In the FEM process, this is done in a piecewise 
manner - over each sub-region (element) of the structure, the displacement field is approx¬ 
imated by using shape functions Ni within each sub-region and nodal degrees of freedom 
ui at nodes strategically located so that they connect the elements together without gener¬ 
ating gaps or overlaps. The functional now becomes a function of the degrees of freedom 
(oi( or U{ as the case may be). The equilibrium configuration is obtained by applying the 
criterion that II must be stationary with respect to the degrees of freedom. 

It is assumed that this solution process of seeking the stationary or extremum point of the 
discretized functional will determine the unknown constants such that these will combine 
together with the admissible or shape functions to represent some aspect of the problem to 
some ‘best’ advantage. Which aspect this actually is has been a matter of some intellectual 
speculation. Three competing paradigms present themselves. 

It is possible to believe that by ‘best’ we mean that the functions tend to satisfy the differ¬ 
ential equations of equilibrium and the stress boundary conditions more and more closely 
as more terms are added to the RR series or more elements are added to the structural 
mesh. The second school of thought believes that it is displacements which are approxi¬ 
mated to greater accuracy with improved idealization. The displacement correspondence 
paradigm belongs to this school. It follows from this that stresses which are computed as 
derivatives of the approximate displacement fields will be less accurate. Here, however, 
we will seek to establish the currency of a third paradigm - that the RR or FEM process 
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actually seeks to determine to best advantage, the state of stress or strain in the structu; 
In this stress correspondence paradigm, the displacement fields are computed from the 
‘best-fit’ stresses as a consequence. 

Before we enter into a detailed examination of the merits or faults of each of the 
paradigms, we shall briefly introduce a short statement on what is meant by the use 
the term ‘paradigm’ in the present context. We shall follow this by examining a series 
simple approximations to the cantilever bar problem but with more and more compl 
loading schemes to see how the overall picture emerges. 

2.2 What is a‘paradigm’? 

Before we proceed further it may be worthwhile to state what we mean by a parade 
here. This is a word that is uncommon to the vocabulary of a trained engineer. The d 
tionary meaning of paradigm is pattern or model or example. This does not convey mu 
in the present context. Here, we use the word in the greatly enlarged sense in which 1 
philosopher T S Kuhn (1962) introduced it in his classic study on scientific progress. In tl 
sense, a paradigm is a “framework of suppositions as to what constitutes problems, the 
ries and solutions”. It can be a collection of metaphysical assumptions, heuristic mode 
commitments, values, hunches, which are all shared by a scientific community and whi 
provide the conceptual framework within which they can recognize problems and sol 
them (Dasgupta 1994). The DC and SC paradigms can be thought of as two competing s< 
narios which attempt to explain how the finite element method computes displacemer 
strains and stresses. Our task will therefore be to establish which paradigm has grea 
explanatory power and range of application. Before we take up this task, let us work < 
a few simple problems. This is the usual epistemological sequence in which learning £ 
experience reinforce our acceptance of one paradigm over the other. 

2.3 Bar under uniformly distributed axial load 

Consider a cantilever bar subjected to a uniformly distributed axial load of intensity qo 
unit length (figure 1). Starting with the differential equation of equilibrium, it is eas> 


show that the analytical solution to the problem is 

u(x) = (q 0 /EA)(Lx-x z /2), ( 

o(x) = (q 0 /A)(L-x). ( 

Consider a one-term RR solution based on u r = a\x, where the subscript r denotes 
use of the RR approach. It can be shown that the approximate solution obtained is 

u r (x) = (q 0 /EA)(Lx/2), ( 

a r (x) = (q 0 /A)(L/2). ( 

An FEM solution based on a two-noded linear element produces 

Uf(x) = (q 0 /EA)(Lx/ 2), - ( 

a f (x) = (q 0 /A)(L/2). ( 
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q(x) = q 0 
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Figure 1. Bar under uniformly dis¬ 
tributed axial load - one-term RR and 
one two-node element solution. 


We see that the RR and FEM solutions are identical. This is to be expected because the 
FEM solution is effectively an RR solution. We may also note the curious coincidence 
where all three solutions predict the same displacement at the tip. However, from figure 1 
we can see that the u r and u/ are approximate solutions to u. It is also clear from figure 1 
that b> = ay = a at the mid-point of the beam. It is possible to speculate that 07 and ay 
bear some unique relationship to the true variation a. 

Consider next what will happen if two equal length linear (i.e., two-noded) bar elements 
are used to model the bar. The solution described in figure 2 will be obtained. First, we must 
note that the distributed axial load is consistently lumped at the nodes. Thus the physical 
load system that the FEM equations are solving is not that described in figures 1 or 2 as a. 
Instead, we must think of a substitute stairstep distribution cry, produced by the consistent 
load lumping process which is sensed by the FEM stiffness matrix. Now, a solution of the 
set of algebraic equations will result in ay and Uf as the FEM solution. 

We see once again that the nodal predictions are exact. This is only a coincidence for 
this particular type of problem and nothing more can be read into this fact. More striking is 



Figure 2. Bar under uniformly dis¬ 
tributed axial load - two two-node el¬ 
ement solution. 
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the observation that the stresses computed by the finite element system now approximate 
the original true stress in a stairstep fashion. 

It also seems reasonable to conclude that within each element, the true state of stress is 
captured by the finite element stress in a ‘best-fit’ sense. In other words, we can generalize 
from figures 1 and 2, that the finite element stress magnitudes are being computed according 
to some precise rule. Also, there is the promise that by carefully understanding what this 
rule is, it will be possible to derive some unequivocal guidelines as to where the stresses 
are accurate. In this example, where an element capable of yielding constant stresses is 
used to model a problem where the true stresses vary linearly, the centroid of the element 
yields exact predictions. As we take up further examples later, this will become more firmly 
established. 

A cursory comparison of figures 1 and 2 also indicates that, in a general sense, the 
approximate displacements are more accurate than the approximate stresses. It seems 
compelling now to argue that this is so because the approximate displacements emerge as 
‘discretized’ integrals of the stresses or strains and, for that reason, appear more accurate 
than the stresses. 

2.4 Bar under linearly varying distributed axial load 

We now take up a slightly more difficult problem. The cantilever bar has a load distributed 
in a linearly varying fashion (figure 3). The exact stress distribution in this case will be 
quadratic in nature: 

a{x) = ( 9o L 2 /8A)(4/3 - 2? - (1 - 3£ 2 )/3). (4) 

Some interesting features about this equation can be noted down. A dimensionless coor¬ 
dinate, £ = 2x/L — 1 has been chosen so that it will also serve as a natural coordinate 
system taking on values —1 and 1 at the ends of the single three-node bar element shown 
as modelling the entire bar in figure 3. We have also very curiously expanded the quadratic 
variation using the terms 1, f, (1 — 3£ 2 ). These can be identified with the Legendre poly¬ 
nomials and its relevance to the treatment here will become more obvious as we proceed 
further. 

We shall postpone the first obvious approximation, that of using a one-term series 
u r = ot\x till later. For now, we shall consider a two-term series u r = a\x + ajx 2 . This 
is chosen so that the essential boundary condition at x = 0 is satisfied. No attempt is 
made to satisfy the force boundary condition at x = L. By carrying out the necessary 
algebra associated with the RR process the solution obtained will yield an approximate 
stress pattern given by 

a r (x) = (4 oL 2 /8A)(4/3 - 2?). (5) 

This is plotted in figure 3 as the dashed line. A comparison of (4) and (5) reveals an 
interesting fact - only the first two Legendre polynomial terms are retained. Taking into 
account the fact that the Legendre polynomials are orthogonal, what this means is that in 
this problem, we have obtained o> in a manner that seems to satisfy the following integral 
condition: 
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Figure 3. Bar under linearly varying 
axial load - two-term RR and one three- 
node element solution. 


That is, the RR procedure has determined a o> that is a ‘best-fit’ of the true state of stress 
o in the sense described by the orthogonality condition in ( 6 ). This is a result anticipated 
from our emerging results to the various exercises we have conducted so far. We have not 
been able to derive it from any general principle that this must be so for stronger reasons 
than shown here till now. 

Let us now proceed to an FEM solution. It is logical to start here with the three-node 
element that uses the quadratic shape functions, N\ = £(£ — l)/2, N 2 = (1 — £ 2 ) and 
N?, = £(£ + l)/2. We first compute the consistent loads to be placed at the nodes due to 
the distributed load using P, = / Niqdx. This results in the following scheme of loads 
at the three nodes identified in figure 3: Pj = qoL 2 /6, P 2 — qoL 2 /3 and P 3 = 0. 
The resulting load configuration can be represented in the form of a stress system shown 
as Of, represented by the chain-dotted lines in figure 3. Thus, any FEM discretization 
automatically replaces a smoothly varying stress system by a step-wise system as shown 
by Of in figures 2 and 3. It is this step-wise system that the finite element solution oy 
responds to. If the finite element computations are actually performed using the stiffness 
matrix for the three-node element and the consistent load vector, it turns out, as the reader 
can assure himself, that the computed FEM stress pattern will be 

o f (x) = (q 0 L 2 /SA)(4/3-2t;). (7) 

This is exactly the same as the 07 computed by the RR process in (5). At first sight, this 
does not seem to be entirely unexpected. Both the RR and the FEM processes here have 
started out with quadratic admissible functions for the displacement fields. This implies 
that both have the capability to represent linear stress fields exactly or more complicated 
stress fields by an approximate linear field that is in some sense a best approximation. On 
second thought, however, there is some more subtlety to be taken care of. In the RR process, 
the computed o r was responding to a quadratically varying system o (see figure 3). We 



532 


Gangan Prathap 


could easily establish through (6) that a r responded to a in a ‘best-fit’ manner. However, 
in the FEM process, the load system that is being used is the ay system which varies in 
the stairstep fashion. The question confronting us now is, in what manner did ay respond 
to <Jf - is it also consistent with the ‘best-fit’ paradigm? 

Let us now assume an unknown field a = co + c\% which is a ‘best-fit’ of the stairstep 
field given by ay = 2qoL 2 /3 in 0 < x < L/2 and cry = 0 in L/2 < x < L. We shall 
determine the constants co and ci so that the ‘best-fit’ variation shown below is satisfied: 

j° SW(W - 24 0 L 2 /3)d£ + jf' 1 8W(W - 0)d§ = 0. (8) 

It can be worked out that this leads to 

W(x) = (q 0 L 2 /8A)(4/3-2i;), (9) 

which is identical to the result obtained in (7) by carrying out the finite element process. 
In other words, the FEM process follows exactly the ‘best-fit’ description of computing 
stress fields. Another important lesson to be learnt from this exercise is that the consistent 
lumping process preserves the ‘best-fit’ nature of the stress representation and subsequent 
prediction. Thus, ay is a ‘best-fit’ of both a and ay! 

It again seems reasonable to argue that the nodal displacements computed directly from 
the stiffness equations from which the stress field ay has been processed can actually be 
thought of as being ‘integrated’ from the ‘best-fit’ stress approximation. Note that the ap¬ 
proximate solutions a r or ay intersect the exact solution a at two points. A comparison of 
(4) with (5) and (7) indicate that these are the points where the quadratic Legendre polyno¬ 
mial, (1 — 3§ 2 ), vanishes, i.e., at § = ±l/-/3. Such points are well known in the literature 
of the finite element method as optimal stress points or Barlow points. Our presentation 
shows clearly why such points exist, and why in this problem, where a quadratic stress 
state is sought to be approximated by a linear stress state, these points are at $ = ±1/V3. 

We shall now return to the linear Ritz admissible function, u r = oqx, to see if it 
operates in the best-fit sense. This would be identical to using a single two-node bar 
element to perform the same function. Such a field is capable of representing only a 
constant stress. This must now approximate the quadratically varying stress field a(x) 
given by equation (4). This gives us an opportunity to observe what happens to the optimal 
stress point, whether one exists, and whether it can be easily identified to coincide With a 
Gauss integration point. 

Again, the algebra is very simple and is omitted here. One can show that the one-term 
approximate solution would lead to the following computed stress: 

o>(x) = (<? 0 T 2 /8A)(4/3). (10) 

What becomes obvious by comparing this with the true stress a (x) in (4) and the computed 
stress from the two-term solution, o> (x) in (5) is that the one-term solution corresponds to 
the constant part only of the Legendre polynomial expansion! Thus, given the orthogonal 
nature of the Legendre polynomials, we can conclude that we have obtained the ‘best-fit’ 
state of stress even here. Also, it is clear that the optimal stress point is not easy to identify 
to coincide with any of the points corresponding to the various Gauss integration rules. 
The optimal point here is given by f = 1 — (4/3) 1 / 2 . 
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Table 1. The conceptual frameworks for the displacement correspondence (DC) and stress 
correspondence (SC) paradigms. 


Displacement correspondence paradigm Stress correspondence paradigm 

Stresses at optimal points are matched 


Displacements at nodes are matched 

Stresses computed as derivatives from 
displacements 

Differentiation produces functions which 
are less accurate than original functions 

displacements are accurate; stresses 
are less accurate. 

if there are points where stresses are 
very accurate, this is a consequence 
of the mean value theorem 


Displacements “integrated” from 
stresses 

Integration produces functions which 
are more accurate than original 
functions 

.*. stresses are accurate; displacements 
are more accurate (on average). 

.*. if there are points where stresses are 
very accurate, this is because stresses 
are approximated in a “best-fit” sense 
of true stresses. 




2.5 The DC, SC and aliasing paradigms 

The preceding examples have been so selectively chosen that we seem to have made 
out a very strong water-tight case for the SC paradigm. Let us however examine the 
claims and merits if any of the competing DC paradigm for this problem. The argu¬ 
ment that FEM procedures look to satisfy the differential equations and boundary con¬ 
ditions does not seem compelling enough to warrant further discussion. However, the 
belief that finite elements seek to determine nodal displacements accurately was the basis 
for the original derivation of optimal points (Barlow 1976) - the term substitute function 
is used instead of alias, and is also the basis for what is called the ‘aliasing’ paradigm 
(MacNeal 1994). We recognize that this is precisely what we mean by the DC paradigm 
here. 

It is helpful to use the aliasing metaphor to explain what happens in finite element 
analysis (TEA). The term aliasing is borrowed from sample data theory where it is used 
to describe the misinterpretation of a time signal by a sampling device. An original sine 
wave is represented in the output of a sampling device by an altered sine wave of lower 
frequency - this is called the alias of the true signal. This concept can be extended to finite 
element discretization - the sample data points are now the values of the displacements 
at the nodes (as MacNeal argued) or stresses at optimal points (as argued here) and the 
alias is the function which interpolates the displacements within the element from the nodal 
displacements or stresses in the element from the stresses at the optimal points respectively. 
The FEA can be imagined to be a sampler or processor which translates the true input to 
an approximate or computed output. 

Table 1 summarizes the competing conceptual frameworks presented by the displace¬ 
ment and stress correspondence paradigms. Both paradigms seem heuristically verv 
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Table 2. Barlow and Gauss points for one-dimensional case. 


p 

Node 

locations 

u 

u 

€ 

€ 

Gauss 

points 

Barlow points 

SC DC 

1 

±1 

£ 2 

£ 


l 

0 

0 

0 

2 

0, ±1 


£ 2 

£ 2 

1 

±1/V3 

±1/V3 

±1/V3 

3 

±1/3, ±1 

£ 4 

* 3 

§ 3 

£ 2 

0, ±(3/5) 1 / 2 

0, ±(3/5) V 2 

0, ±V5/3 


1 , f,..., £ 4 indicate polynomial orders from constant to quartic 

appealing. Which would one prefer? The DC paradigm is the universally accepted be¬ 
lief. What is required now to establish that the SC paradigm is superior? 

Let us now use the DC concept to derive the location of the optimal points, as Barlow 
did in 1976, or as MacNeal did more recently in 1994. We assume here that the finite 
element method seeks discretized displacement fields which are substitutes or aliases of 
the true displacement fields by sensing the nodal displacements directly. We can compare 
this with the SC interpretation where the FEM is seen to seek discretized strain/stress 
fields which are the substitutes/aliases of the true strain/stress fields in a ‘best-fit’ or ‘best 
approximation’ sense. It is instructive now to see how the alternative paradigm, the DC 
approach leads to subtle differences in interpreting the relationship between the Barlow 
points and the Gauss points. 


2.5a A one-dimensional problem: We again take up the simplest problem, a bar under 
axial loading. We shall assume that the bar is replaced by a single element of varying 
polynomial order for its basis function (i.e., having varying no. of equally spaced nodes). 
Thus, from table 2, we see that p = 1,2, 3 correspond to basis functions of linear, quadratic 
and cubic order, implying that the corresponding elements have 2,3,4 nodes respectively. 
These elements are therefore capable of representing a constant, linear and quadratic state 
of strain/stress, where strain is taken to be the first derivative of the displacement field. 
We shall adopt the following notation: The true displacement, strain and stress fields will 
be designated by u, e and a. The discretized displacement, strain and stress fields will be 
designated by u, ? and a. The DC displacement, strain and stress fields will be designated 
by u d , e d and a d . Nodal displacements will be represented by 

We shall examine a simple scenario where the true displacement field u is exactly one 
polynomial order higher than what the finite element is capable of representing - the 
Barlow points can be determined exactly in terms of the Gauss points only for this case. 

We shall now take for granted that the best-fit rule operates according to the orthogonality 
condition expressed in ( 6 ) and that it can be used interchangeably for stresses and strains. 
We shall designate the optimal points determined by the DC algorithm as the Barlow 
points (DC), and the points determined by the SC algorithm as $ s , the Barlow points (SC). 
Note that are the points established by Barlow (1976) and MacNeal (1994), while 
will correspond to the points given in Prathap (1993). The natural coordinate system f is 
used here for convenience. 
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Table 3. The Legendre polynomi¬ 
als Pi. 


Order of 

polynomial 

i 

Polynomial 

Pi 

0 

1 

1 

§ 

2 

(1 - 3§ 2 ) 

3 

(3? - 5£ 3 ) 

4 

(3 - 30§ 2 + 35£ 4 ) 


Thus, for the present case, the use of the SC paradigm leads to 

Jse T (I-e)dV = 0 . ( 11 ) 

This case corresponds to one in which a straightforward application of Legendre poly¬ 
nomials can be made. In this case, one can determine the points where ? = e as those 
corresponding to points which are the zeros of the Legendre polynomials. See table 3 for 
a list of unnormalised Legendre polynomials. We shall show below that in (11), the points 
of minimum error are the sampling points of the Gauss-Legendre integration rule if ? is 
exactly one polynomial order lower than e. 

We shall consider FEM solutions using a linear (two-node), a quadratic (three-node) and 
a cubic (four-node) element. The true displacement field is taken to be one order higher 
than the discretized field in each case. 

Linear element (p = 1) 

u = quadratic = bo + b \£ + & 2 ? 2 , 

P =i 

e = linear = u = b\ + 2bi% = €iPi(£). 

!'= 0 

Note that we have written e in terms of the Legendre polynomials for future convenience. 
Note also that we have simplified the algebra by assuming that strains can be written as 
derivatives in the natural co-ordinate system. It is now necessary to work out how the 
algebra differs for the DC and SC approaches. 

DC: At £,• = ±1, uf = up, then points where e d — e are given by = 0. Thus, the 
Barlow point (DC) is ^ = 0, for this case. 

SC: u = linear, is undetermined at first. Let ? = co, as the element is capable of 
representing only a constant strain. Equation (11) will now give € = cq = bj. Thus, the 
optimal point is = 0, the point where the Legendre polynomial P\ (£) = £ vanishes. 
Therefore, the Barlow point (SC) for this example is % s = 0. 
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Quadratic element (p = 2) 

u= cubic = bo + b\% + * 2§ 2 + * 3 ^ 3 , 
e = quadratic = 

P =2 

= (h + * 3 ) + 2 * 2 £ - b 2 (l - 3£ 2 ) = J2 GPitt)- 

i= 0 

DC: At ff,- = 0, ±1, uf = up, then points where e d = e are given by = ±l/%/3. 
Thus, the Barlow points (DC) are ^ = ± 1 /V 3 , for this case. 

SC: u = quadratic. Let ? = co + ci§, as this element is capable of representing a linear 
strain. Equation (5) will now give ? = (b\ + * 3 ) + 2 * 2 §. Thus, the optimal points are 
£s = ±1/V3, the points where the Legendre polynomial Piik) = (1 — 3£ 2 ) vanishes. 
Therefore, the Barlow points (SC) for this example are t- s = ±1 /V3. 

Note that in these two examples, i.e., for the linear and quadratic elements, the Barlow 
points from both schemes coincide with the Gauss points (the points where the corre¬ 
sponding Legendre polynomials vanish). This also explains why for a very long time, the 
DC paradigm remained the prevailing wisdom. In our next example we will find that this 
is not so anymore. 


Cubic element (p = 3) 

u = quartic = *o + *i£ + * 2^ 2 + * 3 f 3 + * 4 £ 4 , 

6 = CUbiC = M,£ 

= (*i + bi) + (2 * 2 + 12*4/5)1 - *3(1 - 3£ 2 ) - 4* 4 /5(3£ - 5f 3 ) 
p=3 

= J2 ti p AZ)- 

i= 0 

DC: At & = ±1/3, ±1, wf = up, then points where e d = e are given by ^ = 0, 
±V5/3. Thus, the Barlow points (DC) are = 0 , ±\/5/3, for this case. Note that the 
points where the Legendre polynomial P s (£) = (3£ — 5£ 3 ) vanishes are =0, (3/5) 1 / 2 ! 


SC: u = cubic. Let? = cq+ci£ +c 2 (l—3§ 2 ), as this element is capable of representing a 
quadratic strain. Equation (5) will now give? = (*i +* 3 ) + ( 2 * 2 +1 2 * 4 /5)£ — * 3(1 — 31 - 2 ). 
Thus, the Barlow points (SC) for this example are = 0, (3/5) 1 / 2 ; the points where the 
Legendre polynomial P 3 (§) = (3| — 5f 3 ) vanishes. 

Therefore, we have an example where the DC paradigm does not give the correct picture 
about the way the finite element process computes strains. However, the SC paradigm shows 
that as long as the discretized strain is one order lower than the hue strain, the corresponding 
Gauss points are the optimal points. Table 2 summarizes the results obtained so far. 



Finite element analysis and the stress correspondence paradigm 


537 


Our experience is that the SC model is the one that corresponds to reality - that if one 
were to actually solve a problem where the true strain varies cubically using a 4-noded 
element which offers a discretized strain which is of quadratic order, the points of optimal 
strain actually coincide with the Gauss points, and not as predicted by the DC paradigm 
(see § 3.1 below). 

2.6 Axiomatization :The ‘best-fit’ nature of the SC paradigm from a variational theorem 

Our investigation here will be complete in all respects if the best-fit nature of stress cor¬ 
respondence can be deduced logically and quantitatively from a basic principle. This is 
the process known as axiomatization - the motivation is now to compress the paradigm or 
derive it out of a single or minimal set of axioms or fundamental principles. In fact, some 
recent work (Prathap 1993) shows that by taking an enlarged view of the variational basis 
for the displacement type FEM approach we will be actually led to the conclusion that 
strains or stresses are always sought in the best-fit manner. With the help of hindsight we 
know that a similar axiomatization from a basic principle does not seem to be possible for 
the DC paradigm. 

The ‘best-fit’ manner in which finite elements compute strains can be shown to fol¬ 
low from an interpretation using the Hu-Washizu theorem. To see how we progress from 
the continuum domain to the discretized domain, we will find it most convenient to de¬ 
velop the theory from the generalized Hu-Washizu theorem (Hu 1955) rather than the 
minimum potential theorem. These theorems belong to a family of most basic state¬ 
ments (the least action principles) that can be made about the laws of nature, of mat¬ 
ter, motion and energy. The minimum potential theorem corresponds to the conventional 
energy theorem. However, for applications to problems in structural and solid mechan¬ 
ics, Hu (1955) proposed a generalized theorem which had somewhat more flexibility. 
Its usefulness came to be recognized when one had to grapple with some of the prob¬ 
lems raised by finite element modelling. One such puzzle is the rationale for the ‘best-fit’ 
paradigm. 

Let the continuum linear elastic problem have an exact solution described by the dis¬ 
placement field u, strain field e and stress field o (we project that the strain field e is 
derived from the displacement field through the strain-displacement gradient operators of 
the theory of elasticity and that the stress field is derived from the strain field through the 
constitutive laws). Let us now replace the continuum domain by a discretized domain and 
describe the computed state to be defined by the quantities u, e and a, where again we 
take that the strain fields and stress fields are computed from the strain-displacement and 
constitutive relationships. It is clear that ? is an approximation of the true strain field e. 
What the Hu-Washizu theorem does is to introduce a ‘dislocation potential’ to augment 
the usual total potential. This dislocation potential is based on a third independent stress 
field a, which can be considered to be the Lagrange multiplier removing the lack of com¬ 
patibility appearing between the true strain field e and the discretized strain field ?. Note 
that a is now an approximation of a. The three-field Hu-Washizu theorem can be stated 
as 


Stt = 0, 


( 12 ) 
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where 

jr = J{a T e/2 + W T {6 - ?)}dV + P, (13) 

and P is the potential energy of the prescribed loads. 

In the simpler minimum total potential principle, which is the basis for the derivation of 
the displacement type finite element formulation in most text-books, only one field (i.e., 
the displacement field u), is subject to variation. However, in this more general three-field 
approach, all three fields are subject to variation and this leads to three sets of equations 
which can be grouped and classified as follows. 


Variation 

on 

Nature 

Equation 


u 

Equilibrium 

VW + terms from P = 0, 

(14a) 

<7 

Orthogonality 

(Compatibility) 

/ SW T (I — e)dV = 0, 

(14b) 

6 

Orthogonality 

(Equilibrium) 

f8I T (W-a)dV = 0. 

(14c) 


Equation (14a) shows that the variation on the displacement field u requires that the inde¬ 
pendent stress field a must satisfy the equilibrium equations (V signifies the operators that 
describe the equilibrium condition). Equation (14c) is a variational condition to restore the 
equilibrium imbalance between a and ct. In the displacement type formulation, we choose 
W — ~o. This satisfies the orthogonality condition seen in (14c) identically, and leaves us 
with the orthogonality condition in (14b). We can now argue that this tries to restore the 
compatibility imbalance between the exact strain field e and the discretized strain field ?. 
In the displacement type formulation this can be stated as 

J 8W T (e - €)dV = 0. (15) 

Thus we see very clearly that the strains computed by the finite element procedure are 
a variationally correct (in a sense, a least squares correct) ‘best approximation’ of the 
true state of strain. There is therefore a uniquely defined correspondence between the 
approximate stress and the true stress in such finite element computations. 

3, Numerical experiments 

So far, our knowledge has been based on theoretical derivations from fundamental prin¬ 
ciples and paradigms which originated from intelligent or intuitive conjecture or guess. 
The deductions we made quantitatively in § 2 from the stress correspondence paradigm 
belonged to this category. Bertrand Russell had pointed out that knowledge based only on 
universal principles is sterile; it is the world of Platonic ideas. Science needs proof, in the 
form of empiricism - only then does it become complete. In our present discipline, which 
is finite element modelling of problems in structural engineering, these experiments would 
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Table 4. 

Case a - Where are the optimal stress points? (—1 < f < —1). 




Expected/predicted 


Observed/ 



Element 

computed 

SC 

DC 

2 -node 

0 

0 

0 

3-node 

±1/V3 

±1/V3 

±1/V3 

4-node 

0, ±V3/5 

0, ±V3/5 

0, ±V5/3 


not be actual physical ones but would be numerical, computational or digital in nature. We 
must therefore adopt the following course of action to ensure that our understanding of 
the stress correspondence paradigm is scientifically complete and coherent: Starting with 
a guess (that the stress correspondence paradigm gives the correct description of the fi¬ 
nite element analysis procedure), we predict quantitatively the various consequences of the 
guess (for example, where the optimal stress points should be), and then design and conduct 
numerical experiments that would verify (falsify) that the real, observed situation (as ob¬ 
tained from a routine ‘first order’ tradition of finite element computation) agree (disagree) 
with these predicted or expected values (from the ‘second order’ traditional exercise). 

In § 2.5a we had elaborated on a simple bar element model under various loading con¬ 
ditions and we had made predictions for the location of the optimal points according to the 
DC and SC paradigms. This will now be experimentally verified as case a below. Three 
other examples we choose are: Case b - the rate of convergence of a beam element; Case c 
- the transverse shear stress for the fundamental thickness shear mode of a hinged-hinged 
beam; and case d - the shear force resultant in an axisymmetric circular plate. Case b is 
based on Timoshenko theory, and cases c and d use elements based on a higher order shear 
deformation theory. 

3.1 Case a - Location of optimal stress points 

In § 2.5a we worked out analytically what the location of the optimal points according 
to the DC and SC paradigms are. We shall now perform numerical experiments using 
2-node, 3-node and 4-node bar elements and in each case apply distributed axial loads 
whose intensity will vary to produce linear, quadratic and cubic variations of axial strain 
along the length. Table 4 compares the results observed/computed from experiment with 
those deduced analytically from the competing paradigms and listed under the column 
expected/predicted. Note that for the two simpler elements, the DC paradigm proved to 
be deceptively accurate; it is only in the cubic element that it is seen that only the SC 
paradigm makes the correct prediction. 

3.2 Case b - Rate of convergence for a tip-loaded cantilever idealized with linear Tim¬ 
oshenko beam elements 

Figure 4 shows a cantilever beam under tip load. The dimensions are chosen such that the 
tip deflection under the load will be w — 4.0. The example chosen represents a thin beam 
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p= i-o 

b= 10 

Figure 4. Cantilever beam under tip 
load. 

so that the influence of shear deformation and shear strain energy is negligible. The finite 
element idealization is performed using equal length two-node C° beam elements based 
on independent linear interpolations of the transverse displacement w and normal rotation 
6 . This element permits constant bending and shear strain accuracy within each element - 
the simplest representation possible under the circumstances and therefore an advantage 
in seeing how it works in this problem. 

The second column of table 5 shows the tip deflections obtained from the finite element 
digital computation. This was the actual epistemological sequence in which the under¬ 
standing was obtained - these results were known to this writer in 1980, much before 
any explanatory paradigm was offered. What was noticed was that if in'is the true (i.e. 
analytical) solution to the exact problem and wo the deflection observed from the finite 
element experiment, then the quantity {(uj - wq )}/ w , turned out to be exactly 1/4JV 2 , 
where N is the number of elements used. The predictions based on the SC paradigm were 
made much later, around 1988. 

The challenge now is to see if this relationship describing the rate of convergence can 
be established by arguing that it emerges from the fact that FEA operates according to the 
SC paradigm and that within this paradigm, strains are sought in the ‘best-fit’ manner. We 
pay attention to the bending moment variation observed from the finite element model and 
its relation to the actual bending moment variation in this problem. Figure 5 shows the 
bending moments obtained from 1, 2 and 4-element idealizations of the present problem. 
The true bending moment (shown by the solid line) varies linearly. The computed bending 
moments are distributed in a piecewise constant manner as shown by the broken fines. 
In each case, the elements pick up the bending moment at the centroid correctly i.e., in a 
‘best-fit’ manner. We shall now attempt to relate this to the accuracy (and convergence) of 
results. 

Consider the case where the beam is modelled by equal length beam elements, so that 
any beam region of length L is replaced by an element of length 21. Let the moment and 


L = 100 


E - 10° 


Table 5. Tip deflections of a thin can¬ 
tilever beam, L/r = 100. 

Observed/ Expected/ 
No. of computed predicted 

elements wq w e 


1 

2 

4 


3.0000 

3.7500 

3.9375 


3.0000 

3.7500 

3.9375 
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Figure 5. Bending moment diagrams 
for 1-, 2- and 4-element idealizations of 
a cantilever beam under tip load. 


shear force at the centroid be M and V. Thus the true bending moment over the element 
region for our problem can be taken to vary as M + Vx (this follows from the simple fact 
that equilibrium requires the rate of change of bending moment to be equal to the shear 
force). The discretized bending moment sensed by our linear element would therefore be 
M - it cannot do any better. We shall now compute the actual bending energy in the element 
region (i.e., from a continuum analysis) and that given by the finite element model. We 
can show that 

Energy in continuum model = (/ /EI)(M 2 + V 2 l 2 / 3), (16a) 

Energy in discretized model — {l/El)(M 2 ). (16b) 

Thus, as a result of the discretization process involved in replacing each continuum segment 
of length 21 by a linear Timoshenko beam element which can give only a constant value 
M for the bending moment, there is a reduction (error) in energy in each element equal to 
(l/EI)(V 2 l 2 / 3). It is simple now to show from this that for the cantilever beam of length 
L with a tip load P, the total reduction in strain energy of the discretized model for the 
beam is U/AN 2 where U — P 2 L 2 /6EI is the energy of the beam under tip load. 

We can now show that this error in strain energy translates into an error in the deflection 
under load P. From (16a) and (16b) and Castigliano’s second theorem, it can be deduced 
that the tip deflection, u>, of the continuum, and that expected (or predicted) from the SC 
paradigm, w e , will differ as w — uJ e /w = 1 /AN 2 . The third column in table 5 shows this 
expected or predicted rate of convergence. This follows from the fact that if any linear 
variation is approximated in a piecewise manner by constant values as seen in figure 5, 
this is the manner in which the square of the error in the stresses/strains (or, equivalently, 
the difference in work or energy) will converge with idealization. 

The agreement between observed and expected results is exact. Thus from the SC 
paradigm and using a simple example, we can deduce quantitatively that the rate of con¬ 
vergence observed in the numerical experiment could be exactly predicted. This reinforces 
again our conviction that the SC paradigm is the correct description of the finite element 
process. 
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\ 

Figure 6. Hinged-hinged beam un¬ 
dergoing fundamental thickness shear 
vibration. 

3.3 Case c — the transverse shear stress distribution for the fundamental thickness shear 
mode of a hinged-hinged beam 

Figure 6 shows a hinged-hinged beam of depth d, length l and rectangular cross-section 
and for simplicity an assumed shear modulus G = 1. It is possible for the beam to vibrate 
in what is called a fundamental.thickness-shear mode - without transverse deflection (i.e., 
w = 0), the displacement being entirely parallel to the neutral axis. The mode-shape 
describing the cross-sectional distortion through the depth is 

u = sin (7 tz/d), (17) 

where z is measured from the neutral axis of the beam. This implies that the shear 
strain/stress distribution varies as 

t xz = (n/d)cos(7tz/d), (18) 

through the thickness. Note that this distribution satisfies the traction-free conditions spec¬ 
ified at the top and bottom surfaces. The nature of motion is such that it produces a pure 
shear stress state that varies only through the thickness and not along the length. 

We shall now model this problem using finite elements based on what is called a higher- 
order shear deformation theory. The displacement field chosen for such a problem is 
quasi-two-dimensional (stresses are now functions of the x- and z-axes), as can be seen 
below: 



M = no + z0 + z 2 Uq + z 3 8*, (19a) 

w = wq + zf + z 2 Wq. (19b) 

If a three-noded beam element is the basis for the finite element formulation, then the 
degrees of freedom are interpolated in the x-direction using quadratic shape functions. 
Also, the variations in the z-direction are as depicted in (19). The discretization process 
is now two-dimensional - in addition to the approximation along the beam axis (x-axis) 
represented by interpolations of variables defined at nodes spaced along the length using 
conventional shape functions, there is also an approximation in the thickness direction 
(z-axis) represented as a Taylor series expansion in terms of variables available at the node 
at that location. This allows us to examine the nature of the computed stress variation in 
the thickness direction when elements permitting this are used to see if the best-fit stress 
correspondence paradigm covers this problem as well. 

Computations for hinged-hinged beams of various slenderness ratios (d / 1 ranging from 
0.1 to 3) showed that the lowest thickness-shear frequency was picked up accurately. The 
cross-sectional distortion pattern obtained from the FEM model for the thickness-shear 
mode showed that the vibration had no transverse or symmetric axial deflection (wo, ilr. 
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Figure 7. Axisymmetric circular 
plate under uniformly distributed trans¬ 
verse load. 


w*, mo. Wq < 10 -14 ) and only a predominantly antisymmetric axial deflection pattern 
characterized by 9*d 2 /9 = —1.399 for all the values of d/l considered and at all sections 
along the length of the hinged-hinged beam. 

We shall now try to rationalize the observation that the mode-shape obtained from the 
finite element computations was characterized by the value 9*d 2 /9 — —1.399 for all 
d/l considered. We can simplify the analysis by noting that for this mode, the following 
description suffices: 

u = z9+z } 9*, (20a) 

x xz =9 + 3z 3 9*. (20b) 


We shall derive predictions using both the DC and SC paradigms; in the former we shall 
argue that u replaces u in a best-fit manner and that in the latter, x xz replaces t xz in a 
similar manner. The variation <5 is carried out over the generalized degrees of freedom 9 
and 9* leading to two simultaneous equations in each case and finally to a value of 9*d 2 /$ 
for each case. 

(DC) J 8u T (u — u)dz = 0 gives 9*d 2 /9 = — 1.448, 

(SC)/«C(T«-^)dz = 0 gives «> = -1.402. 

Only the SC prediction agrees very closely with the computed value of —1.399. The DC 
paradigm is therefore not successful in predicting this factor. 


3.4 Case d - the shear force resultant in an axisymmetric circular plate 


Figure 7 shows a simply-supported circular plate loaded by a uniformly distributed trans¬ 
verse loading of intensity q = 1.0. It is modelled using axisymmetric plate elements 
based on the same higher-order theory that was used in case c above (see § 3.3). Again, 
three-node elements are used - these elements are capable of computing the shear stress 
resultant exactly to a linear variation along the length of the element. Interestingly, for this 
problem, the shear stress resultant Q = qr/2 varies linearly along the radius of the plate. 
Thus, according to the SC paradigm, the finite element model should be able to pick up 
the shear stress resultant exactly. 

Table 6 shows the results from a four-element model of the circular plate. We see from 
the fourth and fifth columns that the observed values (from the FE computation) and the 
expected values (from the SC paradigm predictions) agree to seven decimal figures. On 
the other hand, the observed and expected transverse deflections agree only to two decimal 
places! This is of course a specially chosen example to highlight the SC paradigm. In 
most general computation, the computed displacements are, on the average, of greater 
reliability than computed stresses. However, the SC paradigm allows us to take advantage 
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Table 6. Case d - Transverse displacement w and shear force resultant Q for a simply-suppo: 
axisymmetric circular plate under uniformly distributed load from a 4-element model using 3-n 
higher-order axisymmetric plate elements. 


r 

W 

Q 

Observed 

Expected 

Observed 

Expecte 

0.00 

0.1902 

0.1875 

0 

0 

0.25 

0.1658 

0.1648 

0.1250000 

0.125001 

0.50 

0.1064 

0.1055 

0.2500000 

0.250001 

0.75 

0.0357 

0.0359 

0.3750000 

0.375001 

1.00 

0 

0 

0.5000000 

0.50000* 


of the fact that stresses at optin'.;.: points are of comparable or greater accuracy than 
displacements. 


4. What does the finite element method do? 

The persistence of the DC paradigm for a very long time can be attributed to the 
that it is a very natural or obvious or common-sensical interpretation of what seem 
be happening in a finite element computation. After all, at the end of the computat 
it is the global degrees of freedom that are usually the nodal displacements, which 
presented first and the strains/stresses are seemingly processed from these displaceme 
It was therefore widely believed that the finite element method sought approximation 
the displacement fields and that the strains/stresses were computed by differentiating tl 
fields. Thus, elements were believed to be “capable of representing the nodal displacem 
in the field to a good degree of accuracy". Each finite element samples the displacem 
at the nodes, and internally, within the element, the displacement field is interpolated ui 
the basis functions. The strain fields are computed from these using a process that invo 
differentiation. It is argued further that, as a result, displacements are more accura 
computed than the strain and stress fields. This follows from the generally accepted ax 
that derivatives of functions are less accurate than the original functions. It is also arg 
that strains/stresses are usually most inaccurate at the nodes and that they are of gre 
accuracy near the element centres - this, it is thought, is a consequence of the mean v; 
theorem for derivatives. 

However, we have demonstrated theoretically and by using numerical results tha 
actual fact, the Ritz approximation process and the displacement type FEM, which ca 
interpreted as a piecewise Ritz procedure, do exactly the opposite and more unnatural or 
common-sensical thing - it is the strain fields which are computed, almost independent! 
it were, within each element. This can be derived in a formal way - many attempts have l 
made to give expression to this idea, but the present writer feels that the most intellect* 
satisfying proof can be arrived at by starting with the Hu-Washizu theorem. Having 
that the Ritz-type procedures determine strains, it follows that the displacement field: 
then constructed from this in an integral sense - the globally assembled stiffness m; 
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equation actually reflecting this integration process and the continuity of fields across 
element boundaries, and the suppression of the field values at domain edges reflecting the 
imposition of boundary conditions. It must therefore be argued that displacements are on an 
average, more accurate than strains because integrals of smooth functions appear generally 
more accurate than the original data. We have thus turned the whole argument on its head. 


5. Conclusions 

In this paper, we postulated a few models to explain how displacement type FEM works. We 
worked out a series of simple problems of increasing complexity to establish whether our 
conjecture that strains and stresses appear in a ‘best-fit’ sense could be verified (falsified, 
in the Popperian sense) by carefully designed numerical experiments. 

An important part of this exercise depended on our careful choice and use of various 
stress terms. Thus terms like a and ay were actual or true physical states that were sought 
to be modelled. The stress terms b> and o y were the quantities that emerged in what we 
can call the ‘first-order’tradition analysis in the language of Sir Karl Popper - where the 
RR or FEM operations are mechanically carried out using functional approximation and 
finite element stiffness equations respectively. We noticed certain features which seemed 
to relate these computed stresses to the true system they were modelling in a predictable or 
repeatable manner. We then proposed a mechanism to explain how this could have taken 
place. Our bold conjecture, after examining these numerical experiments, was to propose 
that it is effectively seeking a best-fit state. 

To confirm that this conjecture is scientifically coherent and complete, we had to enter 
into a ‘second-order’ tradition exercise. We assumed that this is indeed the mechanism 
that is operating behind the scenes and derived quantities that will result from the best-fit 
paradigm when this was applied to the tme state of stress. These predicted quantities turned 
out to be exactly the same as the quantities computed by the RR and FEM procedures. 
In this manner, we could satisfy ourselves that the ‘best-fit’ and stress correspondence 
paradigms had successfully survived a falsification test. 

Another important step we took was to prove that the ‘best fit’ nature of the stress corre¬ 
spondence paradigm was neither gratuitous nor fortuitous. In fact, we could also establish 
that this could be derived from more basic principles - in this regard, the generalized 
theorem of Hu, which is a variation of the least action principle, was found valuable to 
determine that the best-fit paradigm had a rational basis. 

One important conclusion we can derive from the best-fit nature of the stress correspon¬ 
dence paradigm is that an interpolation field for the stresses o (or stress resultants as the 
case may be) which is of higher order than the strain fields e on which it must ‘do work’ 
in the energy or virtual work principle is actually self-defeating because the higher order 
terms cannot be ‘sensed’. This is precisely the basis for de Veubeke’s famous limitation 
principle, that ‘it is useless to look for a better solution by injecting additional degrees of 
freedom in the stresses.’ We can see that one cannot get stresses which are of higher order 
than are reflected in the strain expressions. 

The scientist-philosopher Lewis Wolpert, argued in his influential The unnatural nature 
of science that 
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"... the world is not constructed on a common-sensical basis. ...‘natural’ thinking 
- ordinary, day-to-day common sense - will never give an understanding about 
the nature of science. Scientific ideas are, with rare exceptions, counter-intuitive ... 
common sense is prone to error when applied to problems requiring rigorous and 
quantitative thinking; lay theories are highly unreliable.” 

It is very easy to see that the DC paradigm had direct commonsensical appeal. The SC 
paradigm is counter-intuitive and requires rigorous quantitative analysis to establish its 
validity. 

The author is indebted to Mr R U Vinayak for carrying out some of the computational 
work reported here. He is thankful to Dr K N Raju and Dr B R Somashekar for their 
support. The concepts reviewed here were developed partly under the support of Grants- 
in-Aid schemes of the Aeronautics Research and Development Board of the Ministry of 
Defence and this financial support is gratefully acknowledged. 
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Abstract. In this paper, a unified method is presented: (i) to model delami¬ 
nated stiffened laminated composite shells; (ii) for synthesising accurate mul¬ 
tiple post-buckling solution paths under compressive loading; and (iii) for 
predicting delamination growth. A multi-domain modelling technique is used 
for modelling the delaminated stiffened shell structures. Error-free geomet¬ 
rically nonlinear element formulations - a 2-noded curved stiffener element 
(BEAM2) and a 3-noded shell element (SHELL3) - are used for the finite el¬ 
ement analysis. An accurate and simple automated solution strategy based on 
Newton type iterations is used for predicting the general geometrically nonlin¬ 
ear and postbuckling behaviour of structures. A simple method derived from the 
3-dimensional 7-integral is used for computing the pointwise energy release 
rate at the delamination front in the plate/shell models. Finally, the influence of 
post-buckling structural behaviour and the delamination growth on each other 
has been demonstrated. 

Keywords. Multi-domain modelling; quasi-conforming elements; delamina¬ 
tion growth; 7-integral; automated post-buckling solution. 


1. Introduction 



The laminated and stiffened structures are particularly prone to interlaminar debonding 
( delamination ) type of failures since the interlaminar bond strength is much less when 
compared to in-plane laminar strength. Such delaminations cart be caused, at any time, 
under several design and operating conditions e.g. large transverse stresses, tapering of the 
laminate, clamping in a vice, drilling a hole, low velocity impact such as dropping a tool 
during maintenance etc. (figure 1). In addition, structural fatigue and environmental factors 
like moisture, temperature and corrosion often weaken the interlaminar bond-strength and 
hence accelerate formation/propagation of the delaminations. 
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Figure 1. Causes for initiation and growth of delamination failure - (a) Presstressed 
plates; (b) impact/indentation loads; (c) transverse shear/normal stresses; (d) in-plane 
shear/normal stresses; (e) laminate tapering; (f) ply-failure under operating conditions. 
The respective causes for delamination are as follows, (a) Hygro-thermo-mechanical 
compressive stresses due to fabrication and/or drawing defects; (b) plastic zone under 
impact load leading to material failure and local partial layer separation. The resulting 
compressive layer forces can cause delamination; (c) local bonding material failure 
due to high transverse stresses often cause delamination, particularly near the free 
edges, plies debond in the opening mode; (d) high in-plane stresses cause ply-failure, 
which in turn accelerates the delamination process; (e) high local bending stress 
concentrations in the encircled zones may force the laminae to separate at the comers; 
(f) as in (d), ply failure under any operating conditions accelerates the delamination 
process. 


The delaminations are particularly dangerous because: they generally reduce the overall 
laminate strength due to material discontinuity; they act as imperfections when located 
eccentrically, and thus substantially reduce the overall buckling strength of the laminate; 
and they grow rapidly under in-plane compressive loads - since the delamination often 
buckles locally much earlier than global structural buckling - resulting in a progressive 
reduction in laminate strength and increase in delamination growth rate, finally leading to 
fatal failure. Also, stiffened delaminated structures can buckle in multiple levels - local 
delaminate, laminate, panel, stiffener and structural - often accelerating the delamination 
growth dramatically. 

In addition, delaminations are very often hidden and escape simple inspection and 
have very high potential to grow under operating conditions. Thus an a priori assess¬ 
ment of: the nature and magnitude of delamination that could be induced under 
specified circumstances; the growth rate of the delamination under specified loading en¬ 
vironments and structural instability; the reduction in laminate strength due to the pres¬ 
ence and the growth of a delamination; and possible methods for avoiding the damage 
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and/or arresting/controlling the damage growth, become essential so that the designer 
can build these considerations into his basic design. Accordingly, these problems have 
been addressed widely over the last two decades from both experimental and theoret¬ 
ical points of view. In this paper we shall limit ourselves to a reliable computation of 
the post-buckling structural behaviour and the pointwise energy release rate distribu¬ 
tion along the delamination front in a laminated composite structure. The energy re¬ 
lease rate computed here can be used as an effective feedback to the designer to check 
whether an existing delamination is a potential danger from the structural integrity point 
of view. 

Though the delaminations are prone to grow under a variety of loading configurations, 
it is understood that they are extremely sensitive to the buckling loads. Under such loads, 
they can reduce the overall buckling strength of the laminate considerably and can also 
grow dramatically under postbuckling loads, leading to structural failure. Very often, de¬ 
laminated composites can be modelled as problems of plate-bending, using any of the 
theories of plates and shells that are well-established. Furthermore, laminate deformation 
is mostly elastic and hence the different energetic measures established in linear elastic 
fracture mechanics are meaningful for characterising delamination growth. Use of these 
energetic measures in conjunction with an analysis of the post-buckling behaviour of the 
delaminated plates often results in a simple a posteriori expression for the pointwise energy 
release rate distribution along the delamination front as demonstrated in this paper. Thus, 
modelling and analysis of coupled failure mechanisms in stiffened delaminated structures 
- which were once formidable - are being reconsidered with renewed confidence in recent 
years. 

Most of the researchers in the past have concentrated on the very thin near-surface recti¬ 
linear and/or circular shaped delaminations in a homogeneous isotropic material medium 
(Kachanov 1976; Chai et al 1981; Bottega & Maewal 1983; Evans & Hutchinson 1984; 
Yin 1984,1985). This is because, under certain assumptions (known as ‘thin film’ assump¬ 
tions), the deformations of such delaminations can be studied, in isolation, as problems of 
clamped plates under compressive loads and one can obtain quasi-analytical estimates of 
delamination growth under simple loading conditions e.g. biaxial load. Recently, attempts 
were made to obtain solutions for elliptic delaminations e.g. using the Rayleigh-Ritz tech¬ 
nique and certain geometric constraints to couple the post-buckling large deflections and 
membrane deformation (Flanagan 1988). Generally, such solutions are valid for very thin 
delaminations in thick laminates. But, most of the practical laminate panels, for example 
in aerospace applications, are generally thin and very often the delaminate thickness is 
comparable to the total laminate thickness or to the base thickness (see figure 2). Also, 
such solutions (e.g. Evans & Hutchinson 1984) are generally based on assumptions of 
quasi-linear local post-buckling behaviour of the delaminate plate and hence are valid 
only in the vicinity of the range of the load for local buckling of the delaminate. In prac¬ 
tice, however, the local delaminate post-buckling behaviour is very nonlinear; and also the 
laminated panels are often allowed to buckle even globally. Finally, the presently available 
analytical or quasi-analytical solutions are limited to a single delamination of a standard 
shape and location only in plates of standard topology and boundary conditions. 

Today, one would turn to the finite element method for a general computational anal¬ 
ysis of arbitrarily shaped stiffened composite laminates. However, an analysis of the 
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growth of embedded delaminations requires 3-dimensional modelling (Whitcomb 1989) 
associated with a sophisticated geometrically nonlinear post-buckling solution capabil¬ 
ity. Hence, if a finite element method is used, the analysis becomes extremely expensive 
from both computer memory and time points of view. Even though one can think of 
a global 2-dimensional post-buckling analysis and a local 3-dimensional growth analy¬ 
sis using established methods such as alternating technique (Nishioka & Atluri 1981), 
virtual crack extension technique (Parks 1974), modified crack-closure technique (Ry- 
bicki & Kanninen 1977), etc. the procedure is more involved and still expensive to 
use extensively for each incremental solution in a cycle of finite element post-buckling 
solutions. 

In the early 80s, a cost-effective one-dimensional model - the so-called multi-plate 
model - was proposed to model a laminate with a single delamination (Chai et al 1981). 
This procedure, however, can be extended to handle multiple delaminations of differ¬ 
ent shapes and locations in general composite plates and shells as well (Naganarayana 
et al 1995). Here, the delaminated structure is modelled as an assembly of three distinct 
parts, namely, laminate, base and delaminate (figure 2). The well-established theories 
of plates/shells can be readily used for modelling each of the three plates. Normally, 
the same plate theory is used to establish the continuity conditions at the joint between 
them at the delamination front. However, the one-dimensional method presented in Chai 
et al (1981) for computing the energy release rate uses a simple numerical derivative of 
the total potential energy and hence requires two computations - one for crack diame¬ 
ter la and another for the extended crack diameter 2(a 4- da). This makes it cumber¬ 
some and expensive to use in practice extensively, especially for 2-dimensional planar 
delaminations of arbitrary shapes and locations. Today, reliable energy-based parameters 
(such as the ./-integral, the equivalent domain integral etc.) and computational techniques 
(such as the alternating methods, virtual crack extension, modified crack “closure etc.) 
are established in the field of linear elastic fracture mechanics for predicting the crack 
growth in a much simpler fashion. Recently, some of these techniques were extended 
for characterising delamination growth in a multi-plate model, e.g. virtual crack exten¬ 
sion method (Gilletta 1988), modified crack closure technique (Whitcomb & Shivaku- 
mar 1989), VCCTS (virtual crack closure technique step-by-step) approach (Tsao et al 
1991). 

In this paper, we consider the following 3-dimensional energy-based parameters to 
derive simple expressions for the pointwise energy release rate at any point on the front 
of an arbitrary-shaped planar delaminations in composite laminates: the 7-integral (Rice 
1968) computed along a closed surface of an infinitesimal radius enclosing the crack tip; 
and the equivalent domain integral (Nikishkov & Atluri 1987) computed over a finite 
annular volume with the inner surface of an infinitesimal radius enclosing the crack tip. 
These parameters are suitably modified for the present problem of plate/shell flexure; 
and using assumptions that characterise the delamination growth and plate/shell flexure, 
simple expressions are derived for the pointwise energy release rate distribution along the 
delamination front (Naganarayana & Atluri 1995a,b). The techniques presented here, for 
delamination growth prediction, can be used in an a posteriori sense in conjunction with 
any analytical/computational method of post-buckling analysis of plates that can take care 
of appropriate multi-point constraints at the delamination front. However, in this paper, an 
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in-house finite element software - NONCAT: NONlinear Computational Analysis Tool fdr 
structural analysis (Huang et al 1995) - incorporating curved stiffener and shell elements, 
an automated nonlinear post-buckling solution, and multi-domain modelling technique, is 
used for the analysis. 

Here, the multi-domain modelling technique (Naganarayana & Huang 1995) is used to 
model the delaminated plates/shells. A 3-noded triangular quasi-conforming curved shell 
element (SHELL3) is used for modelling the delaminated sublaminates and the nonde- 
laminated plate/shell. A 2-noded curved beam element (BEAM2) is used for modelling 
the stiffeners. The stiffener element (Naganarayana & Prathap 1996) is developed based 
on the Euler-Bemoulli theory of beam flexure in a curvilinear coordinate system. The 
shell element (Huang et al 1994) is based on a classical shallow shell theory, again, de¬ 
scribed in a curvilinear coordinate system. The causes for membrane locking and nonlin¬ 
ear locking are identified and eliminated from the element formulations, using reduced 
integration for the membrane strain energy (Naganarayana et al 1995). The transverse 
shear strain energy is included into the formulation explicitly in accordance with the 
Reissner-Mindlin theory of plate flexure with the transverse shear strains as nodal de¬ 
grees of freedom. Therefore the elements do not sense shear locking either. In case of 
the shell element, the C°-continuity is exactly preserved for the field variables. However, 
the C 1 -continuity required for the transverse deflection across the element boundaries is 
achieved a posteriori in a weak form - i.e. in a quasi-conforming sense (Huang et al 
1994). 

An automated incremental general nonlinear and post-buckling Newton type solution 
strategy, incorporating an arc-length controlled load incrementation, and branch switching 
based on a linearised asymptotic solution (Huang & Atluri 1995), is utilised while using 
the displacement type finite element model. The stresses are post-processed for each load 
increment, to obtain pointwise energy release rate distribution along the delamination front, 
by using the adapted J-integral and equivalent domain integral approaches (Naganarayana 
& Atluri 1995a,b) mentioned above. 

In this paper, we present the complete computational strategy for structural and finite 
element modelling of delaminated and/or stiffened laminated plates/shells; automated ge¬ 
ometrically nonlinear and post-buckling solution strategy for the finite element model; and 
delamination growth assessment in terms of the pointwise energy release rate distribution 
along the delamination front. Different aspects briefly discussed, are related to structural 
modelling, finite element formulations and possible errors involved, solution strategies 
that can pass the instability points and switch the solution branches if necessary, energy 
release rate prediction and interaction between post-buckling structural deformation and 
delamination growth. Also, the structure of the software NONCATS that involves these 
strategies is briefly explained. Finally, a few examples are presented to demonstrate how 
the present computational model functions. 


2. Structural modelling 

Here, for the sake of convenience and simplicity of presentation, we shall consider a 
laminated composite shell with a single delamination of an arbitrary shape and location, 
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subjected to arbitrary compressive loads (figure 2). The structure is modelled using the 
multi-domain model (Naganarayana & Huang 1995) wherein the delaminated shell is 
assumed to be assembled with three distinct shells: (1) Laminate: nondelaminated zone 
£2^; ( 2 ) Delaminate: thinner side of the delaminated zone £2®; and (3) Base: thicker 
side of the delaminated zone £2^. The three shells, £2^, i = 1,2, 3 respectively, have 
midsurface areas A^ 1 ; thicknesses t^; boundaries 3 £ 2 ^; and midsurface boundaries 3 A^ l \ 
The delamination edge is denoted by T. The assumptions of the Reissner-Mindlin theory 
of plate bending are used for modelling each shell and the joint between them. Thus, for 
each shell, the 3-dimensional displacement field (U = [U\ U 2 t/ 3 }) can be expressed 
in terms of the corresponding midsurface displacement (u = {mi M 2 M 3 }) and rotation 
(6 = { 0 ! 0 2 O}) fields as, 

U (i) 0e«, x 3 ) = u (0 (x„) - xf0 (,) (x a ), (1) 

where x^ (a = 1 , 2 ) are the in-plane curvilinear shell coordinates and x^ is the thickness 
coordinate for the ith (i = 1 ,2, 3) shell (figure 2). The structural continuity at the delam¬ 
ination front F is maintained by assuming the deformation to be unique at the junction of 
the three shells i.e. = U® on F. In other words, at the delamination edge , 

the mid-surface degrees of freedom of the delaminate and the base shells are assumed to 
be related to those of the nondelaminated shell by, 


^ = U ® ^ 

/)(!) _ a(2) _ ^( 3 ) m 

'A a. — Ua — Ua 

V uix l) = ui l) + J atr 

where is the distance of the midsurface of the ith shell from the laminate midsurface 
(figure 2). It can be noted that the above continuity conditions at the delamination edge 
can be modified appropriately when using any other alternative plate/shell theory (e.g. 
higher order shear deformable theory) or by choosing appropriate heuristic multi-point 
constraints based on experience. 

Similarly, the beam (stiffener) degrees of freedom are related to the shell degrees of 
freedom such that the transverse variations of deformation across the shell and beam 
section are consistent with the Reissner-Mindlin theory: 


2 — i/v ^ -> 

nb _ ns 
“a ot' 

u b a = u s a + ee s a , (3) 

where superscripts b and s represent beam and shell degrees of freedom respectively, and 
e is the eccentricity of the stiffener’s neutral axis with reference to the neutral surface of 
the shell. 

In this paper, a 2-noded curved beam element (BEAM2) (figure 3) and a 3-noded curved 
shell element (SHELL3) (figure 4) are used for modelling stiffened structures. The elements 
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Figure 3. BEAM2: 2-noded curved 
stiffener/beam element. 


are described in a curvilinear coordinate system and are based on a quasi- conformi 
formulation (Huang et al 1994). 

In the current formulation, a classical C 1 -continuous field description is used and 1 
transverse shear strain components are exclusively introduced as generalised degrees 
freedom conforming with the Reissner-Mindlin theory: 

Ya3 = (w,a +b a pup) -9ct = 4>a- 9a, 

where b a p is the curvature tensor of the shell’s mid-surface. Substituting (1) and (4) in' 
regular 3-dimensional strain tensor, we get the membrane, the flexural and the transve 
shear strain components respectively as: 

€ a p — 5 jS A-up, a ) + — b a pw, 

XaP = |(Ka3 ,p +Yp3,a ) ~ j( 4>a,p +Pp,a )> 
e a3 = J Ya 3 - 

The finite element formulations, hence, use the seven degrees of freedom - «i, U 2 , 
tt»i , u> 2 , yo, Y23 - to define the strain components and hence the structural deformati 
It can be observed here that the curved shell element (SHELL3) needs C°-contimn 
interpolation functions (e.g. Lagrangian) for the inplane displacement components ( 
U 2 ) and the transverse shear strain components (yi 3 , 723 ) and C 1 -continuous interpolat 
functions (e.g. Hermetian) for the transverse deflection (w). A compatible field-descript 
is used in the curved beam formulation as well. 



3 


Figure 4. SHELL3: 3-noded cur 
shell element. 
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It is interesting to note here that the transverse shear strains vanish in a variationally 
correct sense as the structural thickness decreases and the effect of the transverse shear 
deformation vanishes from the flexural strains in a consistent manner. Thus, such an element 
formulation is free of shear locking. 

However, these elements suffer from membrane locking when used to model curved 
structures, particularly in the regime of inextensional bending (Babu & Prathap 1988). 
This is due to the inconsistent participation of the terms b a pw with reference to the basic 
membrane strains |( u a ,p + u ^<*). Similarly, inconsistent participation of the nonlinear 
terms \<poi<t>p with reference to the basic membrane strain components lead to the so-called 
nonlinear locking (Naganarayana & Prathap 1996) when used to model geometrically non¬ 
linear systems in the limits of inextensional bending. In the curved beam and shell elements 
considered here, the membrane strain energy is computed using a reduced order of Gaus¬ 
sian quadrature so that both the locking phenomena are eliminated from these elements 
(Naganarayana et al 1995) based on the understanding gained from a one-dimensional 
beam element formulation (Naganarayana & Prathap 1996). 

The shell element is required to satisfy C 1 -continuity requirements (for the transverse de¬ 
flections) over the element domain as well as across the element boundary. C 1 -continuous 
shape functions (e.g. Hermitian) are used to interpolate the transverse deflection over the 
element domain. The C 1 -continuity requirements across the element boundary are how¬ 
ever satisfied a posteriori in a weak form - quasi-conforming field-description - using the 
Hu-Washizu variational principle (Huang et al 1994). 

As mentioned in the previous section, the sublaminate degrees of freedom (at the de¬ 
lamination front) and the stiffener degrees of freedom are related to the corresponding 
laminate degrees of freedom using the multi-point-constraints in accordance with the 
Reissner-Mindlin theory of plate flexure. One may refer to Naganarayana et al (1995), 
Naganarayana & Prathap (1996) and Huang et al (1994) for detailed description of the 
element formulations and finite element modelling/analysis of the delaminated stiffened 
composite structures. 


3. Delamination growth prediction 

Delamination is a typical form of failure in laminated structures occurring purely due to 
failure of the interlaminar bond. Norm lly, in laminated composite structures, interlam¬ 
inar bond strength is much less when compared to laminar strength. Thus, unlike other 
forms of failure (e.g. inter-laminar cracks, spalling etc.) which may start and grow un¬ 
der severe loads and/or fatigue, delaminations may take place at much lower loads and 
could grow very rapidly even under normal maintenance and operating conditions (fig¬ 
ure 1) leading to structural failure. In addition, they very often escape visual inspection. 
Therefore, extra care has to be taken for containing the delamination formation and its 
growth. 

In this section, a computational model is derived to predict delamination growth in 
terms of pointwise energy release rate. It is assumed that delaminations start and grow in 
the interlaminar bond region. Therefore, the delamination and its growth take place in a 
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homogeneous medium so that the growth can be assumed to be self-similar. Therefore, the 
./-integral (or the equivalent domain integral) representing only self-similar crack growth 
is meanin gful in the present case. Here, the 3-dimensional 7-integral and the equivalent 
domain integral are used to compute the strain energy release rates. 

The pointwise energy release rate for 3-dimensional self-similar crack growth, (6( r)), 
is defined as (Atluri 1986) 


- - 3 U a 


S(r)Ar= lim f Wh\-d a php 


e-*0 JA, 


dx\ 


, f . dU a f . dU a A 

+ / cra 2 ^r-dA - / cr a2 —-dA, 
7a. 3*i Jao 3*i 


dA 
3 U a 


/a, 3*! 


1 ~ r\ — 

a 2 3*1 


( 6 ) 


where, a, f = 1,2,3; A e is the area of the tube of radius e enclosing the crack front; Ai 

and A 2 are the areas covering the ends of the tube (figure 5); and a, U and n are defined 
in the crack tip coordinate system x (figure 2). 

For self-similar crack growth in homogeneous media, the path-independence of the 7- 
integral is maintained (Naganarayana & Atluri 1995a,b) and hence the infinitesimal tube 
enclosing the crack edge can have a cross-section of any shape. Considering a rectangular 
tube enclosing the delamination front and passing through the nearest stress recovery points 
(S {l) ) of the adjoining elements (figure 5); applying the assumptions of the theory of plate 
flexure that is used to model the laminate and the delaminated sublaminates; and carrying 
out the integration through the thickness for each sublaminate, we get the pointwise energy 
release rate as a simple function of the stress resultants, the displacement gradients and 
the strain energy densities at the points (S^) as, 


0g(O = T g {W - (NiccUd'i + Mi a 6 a ,i + <2i3«3,i)], 


(7) 


where Fgi*) = (*) g (i) — (*) g a) — (*) ? oj and (*) g (0 corresponds to the quantities (*) 
evaluated at specified points (generally Gauss points) on the annular surface. 

It is interesting to observe here that, if the rectangular tube is shrunk to the surface along 
the thickness coordinate at the delamination edge, we get the pointwise energy release rate 
in terms the nodal values as, 

Sn( F) = IF ,,[W — ( Nl a Uct,l + + <213*^3,1)], (8) 

where lF n (*) = (*)„(i) — (*) n ra — (*)„(3) and (*)„«) corresponds to the quantities (*) 
evaluated at specified nodes. 

However, it is a well-known fact that the stresses, displacement gradients and strain 
energy density are more accurate at the optimal stress recovery points (Barlow 1976; 
Prathap 1993) than at the nodes in finite element analysis. Therefore, 0g(O (7) with the 
J-enclosure passing through die Barlow points is always the most reliable estimate. 

The local stress resultants (N, M, Q) and displacement gradients (U a< p) can be obtained 
from their global Cartesian counterparts (N, M, Q; U a .p), by applying the regular tensorial 
transformations between the reference coordinate system x and the crack tip coordinate 
system x. 
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Figure 5. /-integral for self-similar delamination growth in a plate/shell model: (a) 
The /-enclosure; (b) the finite element model; (c) the modified /-enclosure for the 
shell model. 


Recently, the 3-dimensional equivalent domain integral (EDI) (Nikishkov & Atluri 1987) 
interpretation for the /-integral was modified to the present problem by choosing appro¬ 
priate enclosures around the crack tip and the so-called s -functions in accordance with 
the Reissner-Mindlin theory of flexure (Naganarayana & Atluri 1995a). Then, we get 
the pointwise energy release rate as a weighted average function of the stress resultants. 
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the displacement gradients and the strain energy densities in the elements adjacent to the 
delamination edge, as, 

g e (T) = T e [W - (NlccUa, 1 + MlaO a ,l + Qnhj)l (9) 

where 

^e(*) = T ~l (*)dA ~~ f (*)dA - -f- f (*)dA. 

Ai Jai m J a 2 A3 J a 3 

Again, by applying the required tensorial transformations on the global stress resultants 
(N, M, Q) and displacement gradients (U a ^), their counterparts in the crack tip coordinate 
system (N, M, Q; £/ a>J g) can be computed. 

Thus the pointwise energy release rate derived from the equivalent domain integral 
approach may be more meaningful when compared to that derived from the regular 7- 
integral approach since the former can capture the variation of different parameters along 
the normal to the crack front in the vicinity of the crack tip in a better fashion. However, 
when a constant stress/strain element is used to model the problem, the energy release rate 
computed in (9) becomes identical to that derived directly from the 7-integral ((7) and 
( 8 )). 

The present exercise provides the energy release rate as a design parameter to estimate the 
critical loading condition for a laminated composite staicture with a specified delamination 
embedded in the structure. The critical energy release rate that an interlaminar bond can 
withstand may be obtained by an appropriate material database. 

4. Incremental nonlinear and post-buckling solution strategies 

In a delaminated structure, the delamination normally acts as a geometric imperfection 
such that the structure is susceptible to buckling under compressive loads. Very often, the 
delaminate configuration is such that the delaminated sublaminate(s) buckles locally much 
earlier to laminate/structural buckling. The locally buckled sublaminates will often lead to 
premature global buckling since the original geometric imperfections are now highly ac¬ 
centuated. Thus, the structure may experience multiple post-buckling deformations which 
are highly coupled with each other and simultaneous occurrence of different types of insta¬ 
bilities - limit or bifurcation points. Also, post-buckling structural performance interacts 
with the delamination growth as well. Therefore, an automated incremental nonlinear so¬ 
lution strategy becomes very important for tracing the multiple post-buckling deformation 
modes in a delaminated stiffened composite structure. 

In the finite element context, such an exercise normally involves setting the incremental 
equilibrium equations for the structure, appropriate iteration techniques (e.g. Newton- 
Raphson) with a regular solver (e.g. Gauss elimination) for obtaining incremental solu¬ 
tions, appropriate initial load increment to avoid divergence and to ensure progressive path 
tracing, automated identification and classification of singular points (limit points and bi¬ 
furcation points), automated branch switching to trace the desired post-buckling solution in 
case of bifurcation problems, monitored equilibrium increments for assured convergence 
in spite of the presence, of singular points, and finally obtaining the required incremental 
solutions (displacements and stresses). 
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Generally, in incremental nonlinear finite element analysis (FEA), new incremental 
solution is sought at the unknown point (q + Aq, A 4- AA) by solving the incremental 
equilibrium equation iteratively at a known solution point (q, A) and the convergence of 
the solution is verified using the total equilibrium condition at the known solution point. 
In FEA, the total equilibrium condition at the known point is given by, 

[Ky] • q + AF = 0, (10) 

where [Ky] is the secant stiffness (or simply stiffness) matrix for the system; and, the equi¬ 
librium at the unknown point is equivalently represented by the incremental equilibrium 
equation at the known point, 


[K/] • Aq — AAF = 0, (11) 

where q and A are the nodal displacement vector and the load factor at the known solution 
point while Aq and AA are their incremental values; F are the discretized reference nodal 
forces (typically as specified in the input for the problem); and [K,] is the tangent stiffness 
matrix of the system. 


4.1 Iterative nonlinear solution 


In a nonlinear system, the incremental solution, (11), is sought in an iterative sense (e.g. 
Newton-Raphson iterations) and the total equilibrium condition, (10), is used to verify 
convergence of the incremental solution. In every increment, the new solution is sought 
by incrementing the load factor by a specified step as, 

\\ =k n -1 + AA*, 

qi=q«-i + Aq’, (12) 


where, (*)j, represents the quantity (*) corresponding to the ith iterative cycle during 
nth incremental solution and (*)„ represents the converged quantity (*) at the end of nth 
increment. 

This step involves selection of appropriate initial load increment A A * for the first iterative 
cycle of the incremental solution. The choice of the initial increment should reflect the 
current degree of nonlinearity. If it is too large the solution converges slowly or may 
not converge at all. If it is too small, the solution becomes inefficient from a computer 
response point of view. Several strategies are presented in literature for automatic initial- 
increment-control based on convergence history (Crisfield 1981; Ramm 1981) and the 
so-called current stiffness parameter (Bergan et al 1978; Chan 1988). In this paper, the 
initial arc-length increment is chosen based on convergence history and the previous initial 
arc-length increment so that the cumulative displacements and the load level at the end of 
the first iteration are. 


:A„_1 ± 


q„=q«-i ± 


AA^_j(Sfl_i •s„_i ) 1/2 / /, 




(s n -Sn ) 1 / 2 

AA*_j(s n _i • s n _i ) 1//2 / i e \y 


(s„ - Sn ) 1 / 2 




(13) 
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Figure 6. Incremental solution with controlled equilibrium iterations. 


where, s„ is the reference solution corresponding the reference load vector F„ computed 
ass n = [K f ] _1 F„, I e is the expected number of iterations for convergence in general, I n -\ 
is the number of iterations taken for convergence in the previous incremental solution, and 
y is a parameter that takes a value from 0.5 to 1.0. 

Invariably, the above solution does not satisfy the total equilibrium condition (e.g. 
(10)) when the structural behaviour is nonlinear, and hence, additional iterative cycles 
are required to restore the equilibrium. In this paper, modified Newton-Raphson iterations 
are used for solving the incremental equilibrium equations, (11). It can be seen that the 
conventional iteration strategy at constant load increment exhibits low convergence rate 
and may not converge near a limit point. On the other hand, if the load increment is allowed 
to vary as in figure 6, convergence rate is enhanced and the limit points could be traversed 
successfully since the solution is forced to converge along a constrained convergence 
path. Several equilibrium iteration control strategies - pure displacement control (Powell 
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& Simons 1981; Bergan & Mollestad 1984), hybrid increment or arc-length control (Riks 
1979,1984;Wempner 1979; Crisfield 1981), minimum residual force norm (Bergan 1980), 
minimum residual displacement norm (Chan 1988), constant external work norm (Powell 
& Simons 1981), and constant weighted response norm (Gierlinski & Smith 1985) - have 
been proposed in literature with varied success. 

Here, the most efficient strategy of all - arc-length continuation - is used for deter¬ 
mining the load increment at each equilibrium iteration. Then, the constraint equation for 
computing the load increment during the current iteration (AX l n ) is, 

(Qra Qra — l) ' (*ln *lra—l) “h O'-'n ^-n—l) Fra • F/j = (Ak n ) S n • S n , (14) 

where the current load parameter and displacement vector are expressed, respectively, as, 

<=qr ! + (i5) 

The quadratic equation resulting from substituting (15) into (14) can be readily solved to 
compute the current load increment A‘ n . 

4.2 Automated post-buckling path tracing 

Automated post-buckling involves: detection of possible unstable behaviour and the choice 
of appropriate initial-increment direction so that the solution path is not retraced; classi¬ 
fication of the detected unstable behaviour of the structure; and branch-switching and 
computation of the post-buckling solution(s). 

In the present work, a singularity point is detected during the current increment if the 
determinant of the tangent stiffness matrix (||K f || n ) changes its sign. Once the tangent 
stiffness matrix is decomposed as, ( K t ) n — (L • D • L)„, we have, 

ndof 

IIK? lira = n (*>«)*• ( 16 ) 

;=i 

Two methods are known for classifying the detected singularities as limit points and 
bifurcation points: the first based on the current stiffness parameter (Brendel & Ramm 
1980) and the second on the properties of the so-called generalised deflection (Huang 
& Atluri 1995). Here, the identified instability points are classified as limit points or 
bifurcation points using the later strategy. 

If the identified instability points are limit points (snap-through/snap-back buckling), 
the arc-length controlled equilibrium iterations will successfully trace the post-buckling 
solution path. Several methods have been proposed for automated branch-switching in 
post-buckling structural analysis - e.g. perturbation method (Wagner & Wriggers 1988) 
and linearised asymptotic solution technique (Huang & Atluri 1995). Normally, if the 
instability point is a bifurcation point, its location is computed and then, based on an 
eigenvalue solution, appropriate perturbation is applied to follow the desired post-buckling 
branch in an asymptotic linear sense. 

The nonlinear fundamental state between two solution points n — 1 and n in the neigh¬ 
bourhood of a bifurcation point is linearised to obtain the asymptotic solution (Koiter 1945; 
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Huang & Atluri 1995). After linearising the nonlinear path between n — 1 and n, consider 
an adjacent ( asymptotic ) state qj, near the fundamental state qj,: 

<in = + Vk = q* _1 + X*Aq* + Vk- ( 1? ) 

Substituting (17) into (10), rearranging the tangent stiffness components that are inde¬ 
pendent, linearly dependent and quadratically dependent on the linearised load parameter 
X k = (A.* — X n -\)/{X n — X„_i) asKort, K i n , K p n respectively, and applying the condition 
of buckling at load level X&, we get the following iterative equations for the eigenvalue 
problem: 

Kon • Vk — Xjfc(—Kin - Xjt-lKAr„)%, (18) 

where X*_i is the approximate eigenvalue in the previous iteration. The approximate 
critical buckling load factor X cr obtained can be used to compute the eigenvector r) which 
can be normalized using the following condition: 

V ' K 0n - 77 = 1. (19) 

Note that, since the problem is linearised, the solution understandably consumes much 
less computer time. 

A linear combination of the eigenvector r) and its orthogonal counterpart p is used to 
excite an internal perturbation in the nonlinear fundamental solution path so as to switch 
to the desired secondary post-buckling paths (Huang & Atluri 1995). 

Finally, several convergence criteria - based on several residual displacement and/or 
residual force norms - are available in literature. Here both displacement and force based 
norms are used for verifying convergence of nonlinear solutions. One may refer to Na¬ 
ganarayana (1995) for a unified presentation of different strategies involved in a completely 
automated post-buckling solution for finite element analysis of geometrically nonlinear 
structures. 

5. NONCAT: NONlinear Computational Analysis Tool for structural applications 

A finite element software is developed for general nonlinear analysis of stiffened delami¬ 
nated structures based on the formulation presented in this paper (Huang et al 1995) aided 
by simple pre- and post-processing. Figure 7 shows a schematic diagram of the software 
organisation. 

The element library incorporates shear-flexible curved 2-noded beam (BEAM2) and 
3-noded quasi-conforming shell (SHELL3) elements. The problems of locking are allevi¬ 
ated by using reduced integration for the membrane strain energy. The core finite element 
package is supported by general nonlinear solution tools. The solution module incorporates 
the Gauss elimination in a cycle of Newton-Raphson iterations. The load incrementation 
is automated using the arc-length continuation technique for optimal convergence. Detec¬ 
tion and classification of the instability points are also automated, based on certain specific 
properties of the generalised deflection and some simple heuristic mles. The solution au¬ 
tomatically switches the path based on asymptotic post-buckling theory if the detected 
instability is found to be of the bifurcation type. Finally, the solution is tested for conver¬ 
gence. If the solution does not meet the convergence requirements, the solution is sought 
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Figure 7. NONCATS: NONlinear 
Computational Analysis Tool for Struc¬ 
tural applications. 


again with new appropriate initial load increments. Once the incremental solution has con¬ 
verged, the displacements are processed to get stresses, stress resultants and displacement 
gradients which are in turn used for computing the energy release rate distribution along 
the delamination front. 

Simple pre- and post-processing is provided using the graphical user interface GNU- 
plot. Topological modelling is done based on user-fed geometric data for the substructures 
and their connectivity. A finite element mesh generator is developed for finite element 
modelling in substructure level. The multi-point constraints satisfying the conditions of 
Reissner-Mindlin plate theory are incorporated to model the delamination front as well 
as stiffener-shell joints. The geometric and finite element models are interfaced with the 
GNU for graphical presentation. The results at the end of each load increment - nodal dis¬ 
placements, solution paths, stresses/stress-resultants, and energy release rate distribution 
along the delamination front - are again interfaced with GNU for graphical presentation. 


6. Numerical experiments: Coupled failure processes in delaminated structures 

In this section, the proposed finite element analysis strategy is validated using an available 
analytical solution for local buckling and delamination growth. Several numerical exper¬ 
iments have been conducted to establish the influence of several structural parameters 
on post-buckling structural behaviour and delamination growth (Naganarayana & Atluri 
1995c; Naganarayana & Huang 1995; Naganarayana et al 1995). Here the model is first 
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validated with reference to an available analytical solution and later a few salient numer¬ 
ical examples are considered to demonstrate the coupled failure processes - detrimental 
interaction between the geometric failure (local/global laminate buckling) and the material 
failure (delamination growth) - in a delaminated stiffened/composite structure. 

6.1 Model validation 


Here, we shall validate the finite element solutions with reference to an available analytical 
solution (Evans & Hutchinson 1984) using an isotropic square plate of edge length L with 
a central elliptic delamination. The plate is subjected to biaxial compressive loads and its 
boundary is assumed to be clamped against out-of-plane deformations. One quarter of the 
plate is modelled for the analysis by imposing appropriate symmetry conditions. 264 shell 
elements are used for the nondelaminated plate and 192 elements each are used for the 
delaminate and base plates. The reference applied biaxial compressive loads are assumed 
to be of unit intensity (Fj = 1.0) and the equilibrium equations are solved at each load 
step for an applied load F = XFi, where A is the corresponding load factor. 

The structure is assumed to be isotropic with Young’s modulus E = 6500 and Poisson’s 
ratio v = 0.3. The laminate thickness is chosen as t\ = 0.05L. The numerical experiment 
is conducted for a near-surface circular delamination with to/ti = 0.01, a/b = 1.0 and 
a/L — 0.3. Assuming that the base plate and the nondelaminated plates are infinitely 
stiff when compared to the delaminated plate, the delaminated plate can be considered 
a clamped circular plate under the same radial compressive stress (Evans & Hutchinson 
1984). Then, the buckling strength of the delaminate plate (p c (= X l cr Fi/t \)) is given by 


o c = 1.2233 


(A)e 


( 20 ) 


The local buckling strength of the delaminate plate obtained from the finite element analysis 
compares very accurately with the analytical estimate, (20) as shown in figure 8a. 

Further, assuming that the post-buckling deformation is axisymmetric and nearly linear 
in the neighbourhood of local buckling point, the pointwise energy release rate is given by 
(Evans & Hutchinson 1984), 


Grb(T) = 


(1 - v 2 )i 2 
(1.8285 + v)E 


(<*0 ~ CT c )> 


( 21 ) 


where cro(= XFj/ti) is the actual stress level at which the energy release is being computed. 

The ratio, GFE/Grb , is plotted along the delamination periphery (6 = 0°-90° for the 
quarter circle) in figure 8b for the case of a very thin delaminate configuration fa/ti = 
0.01). It can be observed that, Qfe is close to Q r b when the post-buckling loads are in 
the close vicinity of local buckling point (i.e. A* = X/)J cr ~ 1.0). However, Q r b is under¬ 
estimated when compared to Qfe even when A ~ X l cr . This is because, in the present 
problem, though the delaminate plate is very thin when compared to the total laminate 
thickness fa/ti = 0.01), the base plate is flexible as opposed to the rigid base as considered 
in Evans & Hutchinson (1984). 

The laminate is also thin when compared to its edge length, L ( t\/L = 0.05). Hence, 
the finite element model represents reasonably flexible laminate and base plates as well. 
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*•*=4.751 
A* s 4.107 
**' = 3.455 
A* = 2.793 


A* = 1.084 
A*= 2.105 

X*=V4r 


(C) 



Figure 8. Model validation: Circular thin delaminate in square isotropic plate - (a) 
Critical local buckling strength: Comparison of FE solution with an analytical solu¬ 
tion; (b) pointwise energy release rate distribution along delamination edge: Compari¬ 
son of FE solution with an analytical solution; (c) effect of ‘quasi-linear post-buckling 
behaviour for delaminate on energy release rate. 
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The deviation increases as the buckling load increases beyond its critical value for local 
buckling of the delaminate plate. This is because, the analytical solution, (21), is based 
on the assumption of a quasi-linear post-buckling behaviour for the delaminate plate. 
But, in practice, particularly when the laminate is thin, post-buckling behaviour of the 
delaminate plate is highly nonlinear. Accordingly, much higher energy-release-rates are 
expected when compared to Q r b as shown in figure 8 c. Note that, in figure 8 c, actual stress 
(a) and displacement (A) are ’normalised’ by the critical stress o c and the associated 
critical inward radial displacement A c respectively. 

6.2 Laminated shell with elliptic delamination 

In this section, a cylindrical laminated shell of edge length L with a central elliptic de¬ 
lamination (near the inner shell surface) under axial compressive loads is considered. The 
shell is assumed to be constituted with 32 orthotropic laminae of equal thickness stacked 
in a symmetric fashion: (0/90/45/ — 45)$. The shell thickness is assumed as t\ = 0.05L. 
The delamination configuration is fixed as: a/L = 0.3; a/b — 1.5; and t 2 /t\ = 1/32. 
The major axis of the delamination is oriented parallel to the shell axis. The material 
properties for each layer are: E\ — 208000; E 2 = 26000; V 12 = V 13 = V 23 = 0.16; 
G \2 — G 13 — (?23 1 7500. The reference load intensity is assumed to be unity. The shell 
boundary is clamped against out-of-plane deformation. Keeping the edge-length constant 
(Rif = L, where R is the radius of curvature and if is the angle included), the shell curva¬ 
ture is changed for studying its effects on the buckling and delamination growth behaviour 
of the structure. 

The post-buckling delaminate and base shell deformation (transverse deflection w at 
centroid) is depicted for typical shell curvatures in figures 9a-d. It can be observed that 
the critical load factor for local delaminate buckling increases as curvature increases in 
a linear sense (figure 9e). The global buckling strength of the structure also increases 
as the shell curvature increases; however, due to the presence of the delamination, the 
structure exhibits reduced global buckling strength (results not shown). The maximum 
and average pointwise energy release rates are presented for varying load factor for typical 
shell curvatures in figures lOa-d. It can be seen that the energy release rate decreases as the 
shell curvature increases (figure lOe). Thus local delaminate buckling and delamination 
growth are delayed in shells when compared to plates. 

6.3 Stiffened laminated plate with elliptic delamination 

In this section, we shall consider a laminated composite square plate of edge length L with 
32 orthotropic laminae of equal thickness and stacked symmetrically: (0/90/45/ — 45) s . 
The plate thickness is assumed to be t\ = 0.025L. The central elliptic delamination 
configuration is fixed as: a/L = 0.15; a/b = 1.50; and t 2 (t\ = 1/32. The material 
properties for each layer are. taken as in the previous example. The plate is stiffened in 
both directions in a symmetric fashion as shown in figure 11. The distance between the 
stiffeners is assumed as d = L/2. The sectional properties of each stiffener in axial, inplane 
flexure, out-of-plane flexure, twisting, and transverse shear deformations are respectively: 
EA = 0.104 x 10 9 , El xx = 0.8667 x 10 9 , EI yy = 0.2167 x 10 9 , GJ = 0.39063 x 10 7 , 
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Figure 10. Cylindrical shell: Effects of shell curvature on average and maximum 
energy release rates -0 = 0° (a), 60° (b), 120° (c) and 180° (d); (e) average and 
maximum energy release rates. 

Case-a: plate with no stiffeners, 

Case-b: plate with non-eccentric stiffeners, e = 0, 

Case-c: plate with stiffeners on opposite side of the delamination, e — —10, 

Case-d: plate with stiffeners on the same side of the delamination, e = +10. 

The post-buckling deflections of the delaminate and the base plates are depicted in 
figure 12. The delaminate buckling strength increases with the inclusion of a stiffener. 
It may be noted that stiffeners with zero eccentricity with reference to the plate provide 
maximum delay in the delaminate plate buckling. It is interesting to note that stiffeners do 
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Figure 11. Stiffened laminated plate with central elliptic delamination. 

not appreciably increase local buckling strength when the delamination is located in the 
opposite side when compared to the stiffeners (case-c). 

The average and maximum pointwise energy release rates are presented for the dif¬ 
ferent cases in figure 13. It can be observed that noneccentric stiffeners (case-b) con¬ 
siderably decrease both the average and the maximum energy release rates for a given 
load. However, introduction of eccentric stiffeners (case-c and case-d) lead to apprecia¬ 
ble increase in the average energy release rate for a given load. Though stiffeners on the 
same side as the delamination (case-d) slightly decrease the maximum energy release 
rate, stiffeners on the opposite side when compared to the location of the delamina¬ 
tion (case-c) increase the maximum energy release rate considerably. Thus, noneccen¬ 
tric stiffeners (case-b) delay the delamination growth appreciably. On the other hand, 
eccentric stiffeners (case-c and case-d) may lead to considerably accelerated delamina¬ 
tion growth. Thus from both geometric and material failure points of view, non-eccentric 
stiffeners are preferable while reinforcing a delaminated structure. However, in most 
aerospace applications, the stiffeners are located internally for aerodynamic requirements, 
and the external surface is highly susceptible to loads causing delaminations. Thus, re¬ 
sults for case-c appear to be the most critical from practical considerations. Similar 





70 


B P Naganarayana and S N Atluri 




Figure 12. Stiffened laminated plate: Post-buckling delaminate and base deflec¬ 
tions: (a) No stiffeners; (b) symmetric stiffeners (e = 0); (c) eccentric stiffeners 
(e = -10); (d) eccentric stiffeners (e = +10). 


>ehaviour was observed with reference to stiffened laminated shells (results not 
hown). 


r . Conclusions 

[he laminated and stiffened structures are particularly prone to interlaminar debonding 
delamination) type of failures since the interlaminar bond strength is much less when 
:ompared to in-plane laminar strength. The delaminations are very dangerous since they 
Irastically reduce laminate strength, particularly its buckling strength. In addition, the 
lelaminations grow under operating conditions, particularly compressive loads, further 
•educing the structural strength leading to fatal structural failure. In this paper, a complete 
nethodology is presented for analysing delaminated structures for their residual strength 
md the possible growth of the delamihation (particularly under compressive loading) 
which could be used for optimal structural design as well. 
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Figure 13. Stiffened laminated plate: Average and maximum energy release rate. 

A robust finite element method is presented for modelling the delaminated structures, 
for obtaining accurate structural response and for predicting the delamination growth in 
terms of pointwise energy release rate (figure 7). This program can be enhanced into a 
powerful general purpose software for modelling and analysing failure in general stiff¬ 
ened composite structure by combining the existing library of robust finite elements 
(e.g. FEPACS: version-1.0, Prathap & Naganarayana 1991), solution capabilities (e.g. 
FEPACS: version-2.1, Prathap et al 1994), NONCAT (Huang et al 1995), and the ad¬ 
vanced modelling software for structural and finite element modelling with special 
modules for modelling damage, repair and damage control mechanisms, for predicting 
damage initiation and growth, and for designing/analysing appropriate repair/damage- 
control mechanisms, and with expert advisor systems for problem modelling and for 
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Figure 14. A general purpose expert-aided environment for failure predic¬ 
tion/assessment and designing damage repair/control mechanisms: A desirable in¬ 
frastructure. 
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processing the results as shown in figure 14 using the guidelines outlined in this 
paper. 

Curved shear-flexible 2-noded stiffener and 3-noded shell elements free of membrane 
locking and nonlinear locking are presented. An automated general nonlinear solution 
strategy that can successfully pass the instability points of any kind is incorporated such 
that the multiple post-buckling solution paths that can exist in delaminated structures and 
their interaction could be accurately computed. Arc-length continuation is used for passing 
the instability points and for optimal convergence rate. The instability points are detected 
and classified based on the specific properties of the tangent stiffness and the generalised 
deflection. If the detected instability is of bifurcation type, branch-switching (to follow the 
desired secondary solution paths) is achieved effectively using a simple and cost-effective 
method based on an asymptotic linearised eigenvalue solution at the instability point. Fi¬ 
nally, the displacements and displacement gradients are post-processed to compute stresses 
and stress resultants at element centroids and pointwise energy release rate distribution 
along the delamination front. The 3-dimensional J-integral is used to derive the pointwise 
energy release rate as a function of the stress-resultants and displacement gradients in the 
neighbourhood of the delamination front, and the jump in strain energy density across the 
delamination edge. 

Unlike the conventional methods of 3-dimensional analysis and/or global-local analysis, 
the method presented in this paper is simple and cost-effective, particularly with reference 
to the nonlinear post-buckling structural behaviour when the delaminated structures are 
subjected to compressive loads. The methodology also provides capability to capture mul¬ 
tiple buckling modes (local, intermediate and global); to predict delamination growth in 
pre-buckling, post-buckling regimes; and to compute the interaction between the geomet¬ 
ric and material failures (buckling and delamination growth, in this case) effectively. Some 
typical numerical examples are critically examined to validate the proposed 2-dimensional 
computational model and to demonstrate the coupled geometric and material failure mech¬ 
anisms in delaminated composite structures. 
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Abstract. Linear Elastic Fracture Mechanics (LEFM) has been widely used in 
the past for fatigue crack growth studies, but this is acceptable only in situations 
which are within small scale yielding (SSY). In many practical structural com¬ 
ponents, conditions of SSY could be violated and one has to look for fracture 
criteria based on elasto-plastic analysis. Crack closure phenomenon, one of the 
most striking discoveries based on inelastic deformations during crack growth, 
has significant effect on fatigue crack growth rate. Numerical simulation of 
this phenomenon is computationally intensive and involved but has been suc¬ 
cessfully implemented. Stress intensity factors and strain energy release rates 
lose their meaning, J -integral (or its incremental) values are applicable only 
in specific situations, whereas alternate path independent integrals have been 
proposed in the literature for use with elasto-plastic fracture mechanics (EPFM) 
based criteria. This paper presents certain salient features of two independent 
finite element (numerical) studies of relevance to fatigue crack growth, where 
elasto-plastic analysis becomes significant. These problems can only be han¬ 
dled in the current day computational environment, and would have been only 
a dream just a few years ago. 

Keywords. Fatigue crack growth; material nonlinearity; finite element anal¬ 
ysis. 


1. Introduction 

Fracture mechanics based design has become mandatory for crucial structural components 
in high technology industries. The current technological viewpoint is design based on 
damage tolerance and to aim to specify life till the damage grows to unacceptable levels. The 
field is entirely inter-disciplinaiy, requiring inputs from both experimental and numerical 
methods. Experimental methods required include accurate methods for elastic and inelastic 
material property evaluation, load estimation in-service, non-destructive testing (NDT) 
and full scale fatigue testing. Computational methods in structural mechanics such as 
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finite element or boundary element methods provide stress analysis of cracked bodies and 
post-process fracture parameters such as stress intensity factors or strain energy release 
rates. Data from both these are fed as input for analysis of failure or fatigue crack growth 
for life estimation/extension. For a successful conduct of this program, it is necessary 
to use advanced numerical methods to analyse the structure with realistic assumptions 
covering large deformations, elasto-plastic, visco-elastic/plastic material behaviour. This 
paper focuses attention on elasto-plastic material behaviour and its role in fatigue crack 
growth studies in metallic structures under cyclic loading. 

In many problems, involving small-scale yielding (SSY) at the crack tip, linear elastic 
fracture mechanics (LEFM) has been used in the past for fatigue crack growth studies and 
life estimation. Several well-known models are in practice based on LEFM to estimate 
fatigue crack growth using numerical estimates of stress intensity factors or strain energy 
release rates (or J ) (Broek 1978). In many practical structures and, in particular, structural 
components such as those used in aerospace vehicles, SSY could be violated and one has 
to look for fracture criteria based on elasto-plastic analysis. 

The crack closure phenomenon proposed by Elber (1970) was one of the most striking 
discoveries which modified fatigue crack growth expressions in terms of an effective stress 
intensity range. Practical uses of this phenomenon are generally limited by the difficulties 
in experimental measurements and complexity in numerical simulation to estimate crack 
closure stress and effective stress intensity range. Recently, success is reported in exper¬ 
imental measurements (Sunder & Dash 1982) and in numerical simulation, though the 
problem is still computationally intensive. Even in the absence of crack closure, fatigue 
crack growth analysis has complications. Certain fracture parameters are not valid under 
cyclic loading (or in case of loading/unloading). Under elasto-plastic deformation and 
cyclic loading, stress intensity factors and strain energy release rates lose their meaning, 
/-integral (or its incremental) values are applicable only in the case of deformation theory 
of plasticity (Rice 1968). Due to this, alternate path independent integrals were proposed 
by Atluri & Nishioka (1982) for use in these cases. 

This paper presents two different finite element studies of relevance to fatigue crack 
growth. The first deals with the numerical simulation of crack closure phenomenon and its 
use in study of the effect of low-high and high-low blocks of constant amplitude loading 
on a standard compact tension (CT) specimen. The major issues related to the numerical 
simulation are highlighted. In the second, fatigue crack growth around the interference lug 
joints is analysed in the presence of considerable inelastic deformation around the joint. 
Here AT* integrals proposed by Atluri & Nishioka (1982) were used to fit the fatigue 
crack growth data with success. 


2. Literature 

It is physically impossible to review the entire literature pertaining to elasto-plastic finite 
element analysis. A brief review of literature is presented here pertaining to fracture related 
work dealing with elasto-plastic analysis under cyclic loading keeping our focus on the 
theme of this paper. 


Elasto-plastic analysis of fatigue crack growth 


579 



2.1 Fatigue crack growth 

In general, loads on a practical structure are cyclic and vary in magnitude. The loading 
is represented by a maximum and a minimum applied stress/load in each cycle. If the 
maximum and minimum vary from cycle to cycle, the loading is referred to as variable 
amplitude loading, and if they are constant then the loading is constant amplitude (CA) 
loading. 

The difference between the stress intensity factor values between the maximum and the 
minimum stress is known as stress intensity range A K (figure 1). It was Paris et al (1961) 
who examined a large body of experimental data and gave an empirical relation between 
stress intensity factor range (A K) and crack growth rate (dalAn). They proposed a relation 

da/dn = C(AK) m , (1) 

where C and m are material constants. When predictions on the basis of (1) are made, it is 
assumed that the only tensile part of the load cycle contributes to the fatigue damage and 
that crack tip surfaces close at zero load. Forman et al (1967) modified the Paris equation 
taking into account the stress ratio R (ratio of minimum to maximum stress in a cycle). 

The discovery of crack closure by Elber (1970) led to the modification of the Paris 
equation as 

da/dn = C(AK ef{ ) m , (2) 

where A AT e ff is the effective stress intensity range between the maximum stress and crack 
closure stress. He attributed crack closure to local material yielding near the crack tip and 
residual plastic wake behind the crack tip during the crack growth phase. This can cause 
the crack tip to close even under a positive applied stress. 

2.2 Elasto-plastic analysis 

Finite element analysis including material nonlinear behaviour is briefly described later 
in this paper. Nonlinear material characteristics are determined from uniaxial load tests. 
For bi-axial stress states, the yielding is identified using yield criteria such as Von-Mises, 
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Tresca etc. Incremental finite element analysis is carried out for this problem. The most 
useful finite element formulation is presented as the ‘initial stress’ method by Zienkiewicz 
et al (1969). 

2.3 Contact stress analysis 

There has been extensive literature on contact stress analysis in problems involving chang¬ 
ing contact between two elastic bodies with load level. Here the stress and displacement 
fields are nonlinear with applied load due to the changing configuration. In the presence of 
elasto-plastic deformation, there would be two types of nonlinearities occurring together. 
The literature dealing with the analysis of problems with both these nonlinearities is scanty. 
The only attempt in this direction seems to be by Brombolich (1973) who analysed the 
fastener joint problem combining the two nonlinearities but the details of his work are 
not available. This can be attributed to the fact that these problems are computationally 
intensive. It is possible to attempt these problems now, due to the present day availability 
of large computing power, though tackling them would have been only a dream several 
years ago. 

2.4 Numerical methods for estimation of crack closure levels 

Two-dimensional elasto-plastic finite element analysis of crack growth and closure has 
been conducted by several investigators over the years. Most of this work was extensively 
reviewed by Seshadri (1995). Newman and coworkers were one of the first to numerically 
study crack closure phenomenon in centre-cracked panels under constant amplitude load¬ 
ing or two-level block loading (Newman & Armen 1975; Newman 1976). Their numerical 
estimates of crack closure were qualitatively consistent with experimental measurements. 
Another numerical simulation of crack closure was by Ogura & Ojhi (1977) who studied 
crack growth from notches under variable amplitude loading. Special crack tip elements 
and translation of near crack tip mesh was used by Nakagaki & Atluri (1980). Blom & 
Holm (1985) compared numerical estimates of crack closure in CT specimens at different 
stress ratios with experimental measurements. Lalor & Sehitoglu (1988) contributed ex¬ 
tensively to numerical study of the crack closure phenomenon from notches under constant 
amplitude loading. 

Later on McClung & Sehitoglu (1989) made an extensive review of the basic modelling 
issues and came out with certain criteria which had to be met in the finite element model. 
One of the most important criteria was the refinement of the mesh along the crack line. This 
will be discussed later in the numerical results. Another important issue is the node release 
scheme to simulate fatigue crack growth. The most common scheme is that suggested by 
Newman & Armen (1975) which releases the crack tip node at the maximum load in each 
cycle. Newman completed redistribution of the load due to node release before proceeding 
with the analysis. A similar technique was employed by Chermahini etal (1988) and Blom 
& Holm (1985). Ogura & Ojhi (1977) released the crack tip at the minimu m load in the 
cycle. Nakagaki & Atluri (1980) released the crack tip load at different points along the 
forward loading excursion and found that the opening level was dependent on the timing 
of the node release. Lalor & Sehitoglu (1988) employed a node release scheme in which 
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the crack tip was advanced immediately after the point of maximum load, during every 
increment of unloading. Three primary node release schemes were considered by McClung 

6 Sehitoglu (1989). They were, node release at maximum load with redistribution, node 
release at minimum load and node release immediately after maximum load. Each of these 
schemes involved release of a node in every cycle. It was shown generally that the variation 
of results between the different node release schemes is not significant. 

2.5 Elasto-plastic crack tip parameter 

There were serious attempts in the past to propose alternate fracture parameters in the 
absence of crack closure effects, to fit the crack growth behaviour in the presence of elasto- 
plastic deformations at the crack tip. The work of Rice (1968) led to the development of 
the well-known 7-integral concept. Based on the assumption of self-similar crack growth, 
the 7-integral was derived. With inelastic deformations, the 7-integral can still be used 
as long as there is. proportional loading and there is no unloading. However, once crack 
growth commences, 7 ceases to be the crack-tip parameter (Brust etal 1985). Again, with 
incremental flow theory of plasticity, 7 ceases to be a crack tip parameter as it loses its 
path independence and the physical interpretation of the same is not valid. However, for 
stationary cracks, Hutchinson (1968) and Rice & Rosengren (1968) have shown that 7 
is still a controlling parameter under small scale and fully plastic conditions with power 
law hardening materials. Also, Hutchinson & Paris (1979) opined that 7 is still a crack tip 
parameter for small amount of crack growth and becomes invalid at higher levels. 

Number of alternative crack-tip fracture parameters were tried. Crack-Tip Opening 
Displacement (CTOD) at the crack tip were monitored and used as an alternative parameter 
(Wells 1962). Crack-Tip Opening Angle (CTOA) was also another parameter tried for 
crack-growth correlation. But, it is difficult to accurately determine the value of CTOA 
through experiments. Usually, a combination of 7 and CTOA was used in many situations. 
Again in case of mixed mode loading, CTOA does not describe the crack growth. Strain 
Energy Release Rate (SERR or G) was tried as a possible crack-tip parameter (Nakagaki 
etal\919). 

All the above methods were found to be unsatisfactory. In search of an alternative 
fracture parameter in the elasto-plastic regime, a new path independent parameter (A T c ) 
was proposed. This can be used with flow theory of plasticity and can account for loading 
and unloading and can take any arbitrary loads. This was later modified by Atluri & 
Nishioka (1982) and a new parameter (ATp) was defined. This is a direct measure of 
crack-tip field with flow theory of plasticity and under deformation theory of plasticity it 
is equivalent to 7 defined by Rice (1968). 

2.6 Application to cyclic loading 

In case of cyclic loading which leads to cyclic plastic deformation, the application of the 

7 integral in its original form is questionable as it loses path independence. It is assumed, 
however, that crack growth occurs during the loading portion of any particular cycle and 
any further plastic deformation or damage occurring in the unloading cycle is reflected 
in the next subsequent loading cycle. Here, 7-integral has been redefined with the above 
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any particular cycle. 

Dowling (1977) proposed extension of the concept of /-integral to cyclic loading. They 
carried out fatigue experiments on Compact Tension Specimens (CTS). The A/ values 
were calculated using the area under the curve of the load-deflection curves only dur¬ 
ing the loading part of any particular load cycle. They had also corrected for the crack 
closure using only the portion of the area under load deflection curve where the crack 
is fully open. Here the elasto-plastic data described by /-integral were superimposed 
on the elastic data based on LEFM and a small scatter band was obtained on compar¬ 
ing the results. Dowling (1977) developed equations for calculating the A / using the 
finite element results of Shih & Hutchinson (1976) for centre-cracked and edge-cracked 
panels. 

However, it is obvious from literature that the application of A / has various limitations. 
The present work is derived basically on the concept of incremental flow theory of plas¬ 
ticity, initial stress method and involves cyclic loading. Hence, ATp proposed by Atluri & 
Nishioka (1982) to characterize stable crack growth in elasto-plastic regime is used. This 
parameter can be used for arbitrary loading, unloading and with incremental flow theory 
of plasticity. 

3. Finite element analysis 

The finite element analysis adopted is an incremental elasto-plastic analysis under cyclic 
loading, combined with contact stress problems in certain ranges of loading. The formu¬ 
lations were carried out in three-dimensions and this could easily be reframed to study 
problems in two-dimensions. The main feature of these problems is the two types of non¬ 
linearity arising out of varying contact and material behaviour. This has been dealt with by a 
novel approach using a marching solution. The basic FEM formulation is straight-forward 
and will not be dealt with in detail. Three-dimensional analysis is carried out using 8 - 
noded iso-parametric brick elements and two-dimensional analysis with four-/eight-noded 
quadrilateral elements. The Von Mises yield criterion is: 

F(ffiji &ij) = [Ox ~ Goc ) 2 + (<jy — Qfy ) 2 + (cr z — OC z Y 

- ( cr x - a x )(cr y - a v ) - Oy - a y )(<y z - a z ) 

. - (3) 

- Oi - 0 L z ){o x - ct x ) + 3{{x xy - a xy Y 

(.T yz — 0( yz Y + (t zx &2x)~}]~ ““O';y, 

where a*, o y , o z , x xy , x yz , x zx are the stresses corresponding to the current state; a x ,a y , 
&z, u xy , oc yz , a zx are the components of the back stresses, representing the translation of 
the centre of the yield surface, and oy is the current yield stress. 

If the stress state is such that, Ffoy, ay) < 0, the material is in the elastic state. When 
F ( 07 j , a* j) = 0 , the inelastic state is initiated. Subsequent plastic behaviour under increas¬ 
ing stress and strain is determined by the flow theory of plasticity. The total incremental 
strain has elastic (A € e i) and plastic (A e p i) components as follows: 

A€ = A e e i + A € p i. 


( 4 ) 
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The incremental plastic strain is obtained using an associated flow rule, based on 
Drucker’s Postulate for work-hardening material. The flow rule can be written as 


A €pi — dA 


dFierij , &ij) 
dCTij 


( 5 ) 


where dA. is a positive or a scalar quantity. 

When yielding occurs, the stress state must remain on the translated yield surface. This 
can be derived as follows. By total differentiation of (3) we get, 


d F = 


dF 
3 a j 


) T 

’ 3F 

| da + 

i da . 


da 


The condition is that dF = 0 to remain on the yield surface. Further, 


( 6 ) 


CD 
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since F is a function of (try, ay). So, (6) can be rewritten as 

3F 

(9<rtj 3a ( y) — = 0. 

OCJij 


( 7 ) 

( 8 ) 


3.1 Hardening rule 

The finite element analysis, in particular under cyclic loading covering both loading and 
unloading portions, should properly represent the hardening portions of the stress-strain 
curve. At any stage of yielding, the stress state corresponds to an yield surface (or curve 
in two-dimensions). Once the yield surface at a stress state is known, the next issue will 
be what is the shape, size and position of the subsequent yield surface? 

A power law uniaxial stress-strain response (Ramberg-Osgood type) in a material is 
shown in figure 2. At a particular stage let B be the position on the stress-strain curve 
during the loading portion. For most of the practical materials it has been observed that 
increase in the yield limit in tension is accompanied by a reduction in yield limit in 
compression. The yielding in compression is represented by C resulting in anisotropic 
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hardening (obviously, the case of isotropic hardening corresponds to that where an increase 
in yield limit in tension is accompanied by an equal increase in yield limit in compression). 
This anisotropic hardening is known as kinematic hardening and the phenomenon of stress 
induced anisotropy is known as the Bauschinger effect. 

The kinematic hardening results in translation of the yield surface. A simple case of 
translation of yield surface is shown in figure 2. In problems involving plastic deformations 
during loading and reversed plasticity during unloading, it is essential to use the realistic 
kinematic hardening. In the present cases, the hardening rule proposed by Prager (1955) 
and later modified by Ziegler (1967) is used in the evaluation of the back stresses which 
represent the coordinates of the yield surface 

d ocij = d/i (crij - ay), (9) 

where the scalar dji ensures the condition that the stress state must remain on the translated 
yield surface. On substitution of (9) into (8), it is possible to derive that 

d,U = (3 F/d<Jij)dOij /[(cr ,-j - ay)(3 F/dctij)]. (10) 

‘The equation of equilibrium which needs to be solved can be written as, 

[K e ]{U} = {P} + {Q}, (11) 

where {P} is the applied force vector, {Q} = f B T Aa"dV, {Act"} = {Acr'} — {Act}, 
{Act'} = elastic increment of stress for given de, {Acr} = true increment of stress for 
given de. 

While carrying out the analysis for plane strain condition it is necessary to avoid 
plane strain locking in the formulation.For this purpose the total strain energy is divided 
as, 

^total = jG f v ej Dtd du + jA. f v e u du (12) 

where ed is the distortional strain and e v is the volumetric strain. G — E/ 2(1 + v), 
K — E/ 3(1 —2v) and X = (K—2G/3). The locking was overcome by reduced integration 
employed for evaluating volumetric energy which makes the dilatational strains constant 
within the element (Satish Chander & Prathap 1989). 

3.2 Contact stress analysis 

As observed earlier, several practical problems have changing contact situations combined 
with material nonlinear behaviour. Typical examples of these cases in numerical evaluation 
will be presented, in this paper. 

Contact between two elastic bodies is represented by bounding conditions termed as 
ambiguous. Consider for example the problem of contact between two bodies A and B as 
in figure 3. Let Ca and Cg represent the regions in A and B respectively which could be 
in contact under certain load distribution. The contact is assumed to be frictionless so that 
nodes 1 and 2 in Ca and Cg respectively could establish contact with each other and are 
free to slide. The boundary conditions on the interface could be written as. 
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Figure 3. A typical contact stress problem. 

Case I : Nodes 1 and 2 are in contact. 

Fa = Ft 2 = 0, Uni = Uni, 

Fnl + F n 2 = 0, 

and F„2 < 0, 

so r/iar the contact surface has compressive load. 

Case II: Nodes 1 and 2 are not in contact. 

Fa = -Ff2 = 0, F„i = F n 2 = 0, 
and U n i <{U n l + A), 

where n and t refer to directions normal and tangential to the contact surfaces, F and 
U represent forces and displacements respectively, and A is the appropriate initial gap 
between the points. 

The inequality constraints are ambiguous and they ensure that the normal stress at 
the contact surface is compressive and displacements in the region of no contact are not 
overlapping. 

When there is a possibility of changing extents of contact/separation between the two 
bodies, there would be a moving boundary value problem. A node could be either in the 
region of contact/separation depending on the load level. The stress and displacement fields 
are nonlinear with load level. 

3.3 Cyclic loading : Incremental-iterative solution 

The software developed combining elasto-plastic and contact-stress analysis is described 
in a block diagram (figure 4). The software can handle both loading and unloading by 
carefully monitoring the two types of nonlinearities. 

The solution is incremental-iterative in nature and primarily uses force/displacement 
extrapolation for advancing/receding contact as a part of marching solution. The force 
extrapolation for the case of receding contact is described below. 

The solution takes advantage of the discrete character of finite element solution. In 
the absence of inelastic deformation, the solution is linear in the range of loading when 
contact extends from one node to the neighbouring node. In such a case, solution can 
be extrapolated linearly so that the forces at the end of contact region are equal to zero 


(13) 


(14) 
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Figure 4. Block diagram for elasto-plastic contact stress analysis. 

(figure 5). In the presence of inelastic deformation a similar approach is followed. Here, 
the solution has only material nonlinearity (and no contact variation) when contact extends 
from one node to the neighbouring node. Hence, a marching solution is carried out for 
the contact progressing from node to node, and a purely material nonlinear analysis is 
conducted between the progress of contact from node to node. A considerable amount of 
computational time is saved using this approach to deal with the two different types of 
nonlinearities. 

3.4 Elasto-plastic fracture parameter, AT* 

In the absence of crack closure, the elasto-plastic fracture parameter AT* (Atluri & Nish- 
ioka 1982) was used in the present work to correlate fatigue crack growth data. This 
path-independent integral could be evaluated as (figure 6). 



Figure 5. Schematic variation of radial forces with applied load. 
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Evaluation of AT*. 


AT* = / [AW„,i - (ti - - Af,-Ui,i]dj (15) 

Jr € 

where AW = (crjj + j Aoij) Ae,-_/,i, f,- = oj/nj, and W is the strain energy density, ti is the 
traction vector, (,1) refers to the derivatives 3/3 u. This parameter can be evaluated along 
a contour F e very close to the crack tip. 


4. Numerical results 

Numerical studies are conducted on two different problems when elasto-plastic contact 
stress analysis plays a significant role. In the first, FE estimate of crack closure stresses is 
carried out in CTS specimen under two blocks of loading of Lo-Hi and Hi-Lo sequence. 
The second problem is the case of lug joint with rigid interference pin and two diametrically 
opposite cracks at the hole boundary normal to the direction of loading. 

4.1 Crack closure estimation 

Numerical studies are conducted on a compact tension specimen shown in figure 7. The 
width of the specimen is W — 40 mm. The specimen is loaded by two pins as shown. The 


rStH 



Figure 7. Loading on compact ten¬ 
sion specimen. 





588 


B Dattaguru 





residual plastic wake in the 
neighbourhood of the crack 


tip 


Figure 8. Development of plastic wake. 

distance between the crack tip and the load line is denoted as crack length ‘a’. The loading 
is idealized as acting on half of the hole boundary in the form of cosine distribution. The 
equilibrium is given by 

rn/2 

2 1 <j r rcosdd6 = P (16) 

Jo 

where r is the radius of the hole boundary. 

There are a few issues to be considered in finite element analysis under cyclic loading. 
(1) The first and the foremost is the crack-tip mesh refinement. This has been examined 
in the literature earlier (Newman 1976; McClung & Sehitoglu 1989). It is generally found 
that the mesh should be fine enough so that, when the crack is advanced after every cycle, 
it simulates real crack growth. On the other hand, if it is made too fine, the computa¬ 
tional requirements become too large. One has to achieve a balance between these two 
requirements. The aim of the exercise is to accurately estimate crack-closure stress and 
it is seen that this estimation is not improved in its accuracy over certain levels of mesh 
refinement. Hence, in the current study, the crack-tip element size was chosen, after cer¬ 
tain numerical experiments, as 25 microns. (2) The crack-tip advance scheme is the next 
important issue. It was discussed in the introduction that several investigators earlier used 
different schemes and ultimately it was found by McClung & Sehitoglu (1989) that the 
difference between these two schemes is small. In the present study, the crack-tip node 
release scheme suggested by Newman (1976) is adapted. Here, the node is released at the 
maximum stress in each cycle and the resulting unbalanced stresses are redistributed. (3) 
As the crack advances, it leaves a residual plastic wake behind. Finite element simulation 
of this phenomenon for all the fatigue cycles would be computationally impossible. Here, 
it is postulated that when the crack is at a particular length the crack-closure stress esti¬ 
mates are only influenced by the residual plastic wake in the immediate neighbourhood 
behind the crack tip. This is shown in figure 8. For this purpose the crack is grown over 10 
to 15 cycles to simulate the residual plastic wake behind, before the crack-closure stress 
is estimated at any crack length. Numerical studies showed that this provides sufficiently 
accurate values for crack-closure stresses. (4) Crack closure is identified as the load at 
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Figure 9. Normalised crack closure and opening in low-high block of loading. 


which the first pair of nodes behind the crack tip touch each other during the unloading 
part of the cycle. 

The compact tension specimen is subjected to Hi-Lo and Lo-Hi blocks of loading. The 
high block of loading corresponds to R = 0.3 (cr m in/omax = 0.3, cr m j n = 0.3). The low 
block of loading corresponds to R = 0.1 (<T m i n /o"max = 0.1, cr min = 0.1). In the high 
block of loading the crack-closure stress has stabilised to a value cr c / 0.35, and in the 
low block of loading cr c i & 0.33 (figures 9 and 10). This is the influence (though minor) of 
the stress ratio on the crack-closure stress. The values compare with those reported before 
by Seshadri (1995). In between high-low or low-high blocks of loading, the crack-closure 
stress transits between these two values. 

4.2 Preliminary studies on AT* integral 

Certain preliminary studies were conducted (Satish Kumar et al 1994) on the path inde¬ 
pendent integral AT* and compared with results in literature. The analysis was conducted 
on CTS specimen shown in figure 11. Crack length of 26.4 mm was used in the analysis. 
The material properties and the geometry of the specimen are same as that used by Atluri 
& Nishioka (1982). 

The incremental load was applied as incremental displacements and the load-deflection 
curve obtained from this analysis which is shown in figure 12, along with the comparative 
values from the earlier work. The results are in good agreement. AT* are evaluated for 
each increment of displacement and the summation is carried out as T* = X) AT*. Results 
from the present analysis and those from the literature are shown in figure 13, for different 
increments of displacement. The study was carried out for the loading portion of a cycle 




ppjied stress (a/(w) 
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Figure 10. Normalised crack closure and opening in high-low block of loading. 


and for a part of the unloading portion. The results compare very well and validate 
present software. 

4.3 Cracks around lug joints 

The study was later conducted on a cracked lug joint for a configuration for which ce 
experimental results were available. The lug was made of HE-15 AST Al-Cu alloy, 
geometry of the specimen is shown in figure 14. The lug was fitted with a 4340 
interference pin. For numerical study, a steel pin of different interference values was ch< 
The diameter of the hole in the lug was 2a, and the diameter of the pin was 2a (1 + ^ 
that X represents relative interference between the pin and the hole. Numerical study 
conducted with X = 0.5,0.75 and 1.0%. 
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Y 



A7^ was computed for different contours around the crack tip. The path independence 
of this contour integral for two values of interference is shown in figure 15. It is now 
proposed to use this integral for correlation of fatigue-crack growth data. For this purpose 
the variation of A T* vs crack length is shown for a value of k = 0 . 75 % in figure 16. 

A power law variation of fatigue-crack growth data with A 7^ was attempted for different 
values of interference. For this purpose, FCG data presented earlier (Satish Kumar et al 
1995) is utilized. The correlation of da/dN (crack growth per cycle) with AT* on a 
logarithmic scale is shown in figure 17. The fit is very similar to the Paris equation. 


5. Concluding remarks 

The role of elasto-plastic analysis under cyclic loading in fatigue crack growth studies 
is briefly reviewed. Finite element analysis dealing with two nonlinearities due to mate- 



Figure 15. Path independence of DAT* integral. 
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rial nonlinearity and contact stress analysis in typical problems is presented. Numerical 
estimates of crack closure stresses in CTS specimens under Hi-Lo and Lo-Hi blocks of 
loading are presented using the methods and procedures developed. In the absence of 
crack closure, fatigue-crack growth in structural components is correlated to elasto-plastic 
fracture parameter AT* proposed earlier in literature. Correlation of fatigue-crack growth 
data with this path-independent integral is presented. 


The work presented in this paper is supported by sponsored research projects of the Aero¬ 
nautics R & D Board, Government of India and their support is acknowledged. The author 
expresses his thanks to his colleague Prof T S Ramamurthy and his students Dr B R Se- 
shadri, Dr K Satish Kumar and Dr K S Venkatesh for their contributions in this research 
work. 
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ocity impact analysis of composite sandwich shells 
igher-order shear deformation theories 

DIPAK K MAITI and P K SINHA 

department of Aerospace Engineering, Indian Institute of Technology, 

Kharagpur 721302, India 

e-mail: pksinha@aero.iitkgpemet.in 

Abstract. In the present investigation, higher-order and conventional first- 
order shear deformation theories are used to study the impact response of com¬ 
posite sandwich shells. The formulation is based on Donnell’s shallow shell 
theory. Nine-noded Lagrangian elements are used for the finite element formu¬ 
lation. A modified Hertzian contact law is used to calculate the contact force. 
The results obtained from the present investigation are found to compare well 
with those existing in the open literature. The numerical results are presented to 
study the changes in the impact response due to the increase of core depth from 
zero to some specified value and the changes in core stiffness for a particular 
core depth. 

Keywords. Composite sandwich shell; contact force; finite element analysis; 
Hertzian contact; low velocity impact; transverse shear stress. 

iuction 

;s and composite sandwiches, due to their high specific strengths and stiffnesses 
.1 other attributes, are normally favoured in the design of several aircraft structural 
its. Some of these structural components are likely to experience low velocity 
ring their manufacture, storage or service life. The impact due to a tool drop and 
i hits by flying debris, birds, hail stones etc. are common examples of low velocity 
he understanding of dynamic response of composite and sandwich structures 
to low velocity impact is, therefore, necessary for design and assessment of 
distance. 

sign/analysis techniques for the response of composite materials and structures 
ic loads and simpler forms of dynamic loads are well established. But not much 
been directed to study the impact response of laminated composite structures 
osite sandwiches. It is only in recent years that there has been a growing interest 
gate impact related problems, especially those involving composite materials. 
;ant contribution to the impact behaviour of composite laminates was made by 
un (1981,1982) and Tan & Sun (1985). Based on an experimental investigation, 
>osed empirical relations for the contact force due to loading, unloading and 
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reloading during the impact process. Tan & Sun (1985) also analysed the impact response 
of laminated composite plates using the finite element method (FEM) and modified contact 
law for composite laminates. The transient response of laminated composite structures 
under transverse impact was also studied by Caims & Lagace (1989) and Wu & Chang 
(1989) using FEM. Wu & Springer (1988) employed a three-dimensional transient FEM 
using 8-noded brick elements to determine the size and location of delaminations in the 
laminated composite plates subjected to a non-penetrating impact. Maiti & Sinha (1995a, 
b) studied the impact behaviour of thick laminated composite beams and plates using FEM 
based on higher-order shear deformation theories. 

The low velocity impact response of composite shells is of paramount importance in 
view of the extensive use of composites in aerospace applications. But the information 
available in the open literature on the subject matter is very limited. A double Fourier series 
expansion method to study the impact response of simply supported cylindrical shells was 
used by Christoforou & Swanson (1990). The impact induced fracture in laminated plates 
and cylindrical shells was studied by Lin & Lee (1990) using an experimental technique 
as well as FEM. The impact response of composite cylinders using a mixed finite element 
method and Tsai-Wu failure criterion was investigated by Bachrach & Hansen (1989). 
In a recent investigation (Maiti & Sinha 1995c), we employed FEM to study the impact 
behaviour of doubly curved laminated composite shells. 

The sandwich constructions are stiffness and weight effective. Therefore, they are in¬ 
creasingly used in the aerospace industry. The effects of face lay-up sequence and core 
density of a sandwich plate due to impact were investigated by Kim & Jun (1992) and 
they observed that small relative orientation between adjacent plies and the higher density 
core are desirable in sandwich plates to reduce impact delamination. The low-velocity im¬ 
pact response of foam-core composites with fibre glass/epoxy face sheet was treated by a 
combination of computational and experimental methods by Nemes & Simmonds (1992). 
They used four-noded constant strain quadrilateral elements and linear elastic constitutive 
models for the face sheets and a phenomenological constitutive relation for the epoxy- 
bonded layer along with the foam core. However, the literature available on the impact 
response of composite sandwiches is so meagre that no meaningful conclusion can be 
drawn about the actual behaviour. Moreover, polymer composite faces, in general, and 
conpnon core materials, in particular, exhibit low transverse shear modulus and strength 
properties. This may require the use of higher-order shear deformation theories for accu¬ 
rate estimation of transverse shear stresses and for subsequent prediction of interlaminar 
failure and identification of damage zones. 

In the present investigation the higher-order shear deformation theories (e.g., HST9, 
HST11, HST12) as well as the conventional first-order shear deformation theory (FST) 
are employed to develop a finite element method to investigate the impact behaviour of 
doubly-curved composite sandwich shells. The finite element method incorporates the 
nine-noded quadrilateral elements of the Lagrange family. Shell behaviour is based on 
Donnell’s shallow shell theory. The results depict how the impact response changes due to 
the increase of core thickness from zero to some specified values and the change in core 
stiffnesses for a particular core depth. The present FEM also provides a means to make a 
comparative assessment of various forms of shear deformation theories for applications in 
the present case. 
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9-noded isoparametric element 

Figure 1. Composite sandwich shell and 9-noded isoparametric element configuration. 


2. Formulation 

A doubly-curved sandwich composite shell configuration and a schematic view of a nine- 
noded quadrilateral isoparametric element are shown in figure 1. In the present sandwich 
construction, a core of thick low density material is bonded to two face sheets of com¬ 
posite laminates having arbitrary lamina thickness, materials and fibre orientations. The 
displacement components at any point ( x,y,z ) along three perpendicular directions are 
expressed as 

(i) First-order shear deformation theory (FST): 


u(x,y, z, t ) = uo(x,y, t) + z9 x (x, y, t), 
v(x,y, z, t ) = vo(x,y, t) + z9 y (x,y, t ), 
w(x,y,z, t)-wo{x,y,t)\ 


( 1 ) 




600 


Dipak K Maiti and P K Sinha 


Higher-order shear deformation theories (HST): 

(ii) Nine-degrees of freedom system (HST9), 

u(x,y,z, t) = uq(x, y, t) + z9 x (x,y, t) + z 2 uo(x,y,t) + z 3 9 x (x,y,t), 
v(x,y,z,t)=vo(x,y, t ) +z& y (x,y,t) + z 2 vo(x,y, t) + z 3 9 y (x,y,t), 
w(x,y,z, t) = wo(x,y,t ); 

(iii) Eleven-degrees of freedom system (HST11) 

u(x,y, z,t) = uo(x,y, t) + z9 x (x,y,t) + z 2 uo(x,y,t) + z 3 9 x {x,y,t), 
v(x,y,z, t) — VQ(x,y,t)+z9 y (x,y,t) + z 2 vo(x,y,t) + z 3 9 y (x,y,t), 
w(x,y,z,t) = wo(x,y,t) +z9 z (x,y,t ) + z 2 wo(x,y,t)\ 

(iv) Twelve-degrees of freedom system (HST12) 

u(x,y,z, t)=uo(x,y,t)+z9 x (x,y,t)+z 2 uo(x,y,t) + z 3 9 x (x,y, t), 

v(x,y,z,t) = vo(x,y,t) -l-zfyOr.y, t) +z 2 vo(x,y, t) +z 3 9 y (x,y, t ), 

w(x,y,z, t) = wo(x,y,t) + z9 z (x,y,t ) + z 2 wo(x,y, t) + z 3 9 z (x,y,t ); 

where uq, i>o, uip and 9 X , 9 y , 9 Z , are midplane displacements and rotations and where 
vo, w o and 9 X , Q y , 9 Z are corresponding higher-order terms in Taylor’s series expansic 
Typical strain-displacement relations for HST12 based on Donnell’s shallow shell the 
are expressed as 
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where 6 XX , e yy , e zz etc. are engineering strains and € xx , e yy , e zz etc. are generalised st 
components and are expressed in terms of displacements (uq, vo, wq, 9 x , 9 y , 9 Z e 

as shown in (Al) in the appendix. Note that the term k zz does not exist in the pre: 
case. 

Nine-noded Lagrangian quadratic elements are used for the finite element formulat 
The shape functions for a nine-noded quadrilateral isoparametric elements are 

N i = (*/4)(l + !£;)(1 + rjf]i)^iT]m, i = 1, 2, 3, 4, 

N{ = (1/2)(1 - £ 2 )(1 + r]T]i)i]T]i, i = 5,7, 

Ni = ( 1 / 2)(1 + mil - n 2 Mi, i = 6 , 8 , 

N <) = {\ — § 2 )(1 — T ] 2 ). 
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The displacements (mo. vo, wq, 9 x , 9 y , Q z etc.) at a point within the element in terms of 
interpolation functions and nodal degrees of freedom are expressed as follows 


9 9 9 


uo = ^NiUoi, 
i =1 

9 

Vo = 

1=1 

9 

wo =^NiW 0 i, 
i=l 

9 


e y = Y J N i e y u 

0z = J2 N i0 z i, 
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i=l 

/=l 

9 

9 

9 
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w 0 = ^2Nimi , 

i =1 

i=1 

/=1 

9 

9 

9 

§x = J2 N i § *i' 

i=l 

0y = Y, N i°yi’ 

i=l 

ii 

JlM 


Combining (5) and (7) in conjunction with (Al), the strain-displacement relations are 
expressed as, 

{e} = [B]{u e ], (8) 

where [B] is the strain-displacement matrix and is presented in the appendix. {u e } is the 
element displacement vector. 

The dynamic equilibrium equation for a finite element is derived using Hamilton’s 
principle, as' 



where L e is the Lagrange energy function. 

Substituting energy expressions and performing the integration, the expression for the 
dynamic equilibrium becomes, 

[M e ]{u e } + [K e ]{u e } = {F e }, (10) 

where the element mass matrix [M e ] can be expressed as 

[M e \ = JI [N] T [p][N]dxdy (11) 

where [ N] is the shape function matrix and [p] is the inertia matrix as given in (7) by Maiti 
& Sinha (1994). 

Similarly, the element stiffness matrix [K e ] is given as 

[K e ] = j j [Sf [£>][B]dxdy (12) 

where [S] is the strain-displacement matrix as listed in the appendix, and [D] is the rigidity 
matrix as reported by Maiti & Sinha (1994) and is based on three-dimensional anisotropic 
constitutive relations. 

After assembling all the element mass and stiffness matrices and the force vector with 
respect to the common global coordinates, the resulting equilibrium equation becomes 
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(i) Forced vibration equation 

[M]{u} + [K]{u} = {F}. (13) 

(ii) Free-vibration equation 

[ M]{ii } + [K]{u} = 0. (14) 

(iii) Bending 


[*]{«} = {F}, ( 15 > 

where [M] and [AT] are global mass and stiffness matrices and {«}, {«} and {F} are global 
displacement, acceleration and force vectors respectively. For the impact problem, {F} is 
given as 

{F} = [000. F c .000] r . (16) 

Note that F c is the contact force corresponding to the contact point. 

The dynamic equilibrium of the impactor can be expressed as follows: 

m,u); + F c = 0 (17) 

where m,- and w ,• are impactor mass and acceleration respectively. 

Equations (13)—(15) govern the structural response, while (17) defines the impactor 
motion. It should be noted that the contact force vector {F} must be calculated before 
the target response can be analysed. The solution of (13) and (17) is achieved employing 
Newmark’s time integration scheme. Equations (14) and (15) are solved by the subspace 
iteration method and the Gauss elimination method respectively. 


3. Contact laws 


During loading the contact force can be calculated using the modified Hertzian contact 
law as follows. 


F c = na 3/2 , 


(18) 


where a is the local indentation and n is the modified contact stiffness for composite 
materials proposed by Yang & Sun (1982) as 


4 PR 1 

n = -V Ri -=- 

3 (1 - vf)/Ei + 1/E 33 


n = _I_1 

3 ll/Ri + l/2R s \ 
” = 3 [l/F,- + 1/F.s] 


, for plate, 
1 


(1 - vf)/Ei + I/E33 
1/2 l 

(1 - vfyEi + l/£ 33 ’ 


for cylindrical shell, 


for spherical shell, (19) 
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where Ri , Ei and v* are the radius, modulus of elasticity and Poisson’s ratio of the impactor 
and R s and £33 are radius and transverse modulus of elasticity of composite cylindrical 
and spherical shell targets. 

Upon unloading, the contact force is simulated by the following relation. 


F c — F m 


r -,2 5 

a — ao 
_ ~ &0 . 


and for reloading the indentation law is modified as, 


F c = F m 


a — ao 
.CCm - « 0 - 


( 20 ) 


( 21 ) 


where F m is the maximum contact force just before unloading, a m is the maximum local in¬ 
dentation during this loading/unloading process. The permanent indentation is determined 
from the following expressions 


ao = 0 , when a m < a cr , 

ao = a m [l - (a cr /a m )] 2/5 , when a m > a cr , ( 22 ) 


u C r is the critical indentation beyond which permanent indentation will occur and is ap¬ 
proximately equal to 8.0264 x 10 ~ 5 m (0.00316 in.) for a graphite-epoxy composite face. 


4. Numerical results and discussion 

Based on the above finite element procedure, computer programs are developed to study the 
impact behaviour of laminated sandwich shells. The computer programs are coded with the 
help of Fortran-77 language and the analysis is carried using a 486 (Oasys) system under 
Unix environment. For the present first-order shear deformation theory, a shear correction 
factor of 5/6 is used to modify the shear energy and no shear correction factor is used 
for the higher-order shear deformation theories. The following boundary conditions are 
used: 

Simply support: 

vo = wo — Oy = 6 Z = t>o = wo = §y = &z = 0, at x = 0, a, 
uo = wo = 0 X — 6 Z = MO = WO = Ox = Qz = 0, at y = 0, b. 

Clamped-clamped support: 

uo = Vo = wo = 9 X = 0y = 9 Z = uo = Vo = wo = Ox = 9y - 9 Z = °> 
at x = 0, a and at y = 0, b. 

Clamped-free support: 

uo = vo = wo = 9 X = 9 y = 0 Z = uo = vo = wo = 9 X = 9 y = 9 Z = 0, 
at x = 0 and mq, i>o> wo, etc. are not specified at other edges. 
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Table 1 . Non-dimensionalised central deflections and stresses for a simply supported square 
sandwich plate subjected to sinusoidal load (4x4 mesh, quarter plate). 

Method tv dxjcCI) &xx (2) ^yv X xy x yz X xz 


5 = 100 

FST 

0.8851 

HST9 

0.8910 

HST11 

0.8867 

HST12 

0.8867 

Exact 

- 

5 = 50 

FST 

0.9062 

HST9 

0.9293 

HST 11 

0.9253 

HST12 

0.9253 

Exact 

- 

5 = 20 

FST 

1.0524 

HST9 

1.1944 

HST11 

1.1901 

HST12 

1.1901 

Exact 

- 

5 = 10 

FST 

1.5605 

HST9 

2.0849 

HST11 

2.0807 

HST12 

2.0807 


± 1.1242 

± 0.8994 

± 1.1246 

± 0.8969 

± 1.1225 

± 0.8952 

± 1.1225 

± 0.8952 

± 1.0980 

± 0.8750 

± 1.225 

± 0.8980 

± 1.1267 

± 0.8907 

± 1.1246 

± 0.8890 

± 1.1246 

± 0.8890 

± 1.099 

± 0.8670 

± 1.1105 

± 0.8884 

± 1.1372 

± 0.8440 

± 1.1352 

± 0.8423 

± 1.1352 

± 0.8423 

± 1.1100 

± 0.8100 

± 1.0720 

± 0.8576 

± 1.1784 

± 0.6928 

± 1.1765 

± 0.6913 

± 1.1765 

± 0.6913 

± 1.1520 

± 0.629 


± 0.0559 

0.0445 

± 0.0562 

0.0447 

± 0.0583 

0.0445 

± 0.0583 

0.0445 

± 0.0550 

0.0437 

± 0.0568 

0.0449 

± 0.0580 

0.0456 

± 0.0601 

0.0454 

± 0.0601 

0.0454 

± 0.0569 

0.0446 

± 0.0628 

0.0477 

± 0.0699 

0.0516 

± 0.0722 

0.0514 

± 0.0722 

0.0514 

± 0.0700 

0.0511 

±0.0818 

0.0565 

± 0.1068 

0.0705 

± 0.1094 

0.0701 

± 0.1095 

0.0701 

±0.1099 

0.0717 


0.0258 

0.3001 

0.0265 

0.3069 

0.0270 

0.3064 

0.0270 

0.3064 

0.0297 

0.3240 

0.0266 

0.3046 

0.0270 

0.3040 

0.0275 

0.3034 

0.0275 

0.3034 

0.0306 

0.3230 

0.0288 

0.3018 

0.0313 

0.2984 

0.0318 

0.2978 

0.0318 

0.2978 

0.0361 

0.3170 

0.0361 

0.2940 

0.0446 

0.2821 

0.0450 

0.2815 

0.0450 

0.2815 

0.0527 

0.3000 


Exact 


Exact values correspond to those of Pagano (1970) 

w = \OOE22fw/qohS A , (d xx ? dyy> X xy ) = l/qoS 2 (cr xx * &yyy ?xy ), (h: Z , x yz ) — l/#0 S(x XZf x yz)’ 
S = a/h a xx (1) at (a/2,6/2, ±6/2), cr** (2) at (a/2, b/ 2, ±0.46), at (a/2,6/2, ±6/2), 
ixy at (0,0, ±6/2), T yz at (a/2,0, ±6/2), t xz at (0,6/2, 0), 


4 .1 Comparison of results 

To establish the present finite element formulation, bending and free vibration results are 
compared with those existing in the literature. Non-dimensionalised central deflections 
(u>) and stresses (a xx , d yy> x xy , x yz , x xz ), for a simply supported sandwich plate subjected 
to sinusoidal surface loading, are presented in table 1 . The material properties and lay-ups 
are used as assumed by Pagano (1970). It is observed that the results agree well for thin 
sandwich plates but differences are noted with the increase in a/ h ratio. In comparison 
to the first-order shear deformation theory, the higher-order shear deformation theories 
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Table 2. Natural frequency (Hz) for a simply supported sandwich plate. 



Raville- 

Raville- 

Khatua- 

Present 

Present 

Present 

Present 

Mode 

Ueng 

Ueng 

Cheung 





m, n 

Exp. 

Theory 

FEM 

FST 

HST9 

HST11 

HST12 

1,1 

- 

23 

23 

23.289 

23.327 

23.326 

23.326 

2,1 

45 

45 

45 

44.178 

44.303 

44.307 

44.307 

1,2 

69 

71 

71 

69.689 

70.056 

70.061 

70.061 

3,1 

78 

80 

82 

79.918 

80.167 

80.178 

80.178 

2,2 

92 

91 

92 

89.059 

89.651 

89.676 

.89.676 

3,2 

129 

126 

128 

123.499 

124.374 

124.401 

124.401 

4,1 

133 

129 

136 

128.657 

129.271 

129.356 

129.356 

1,3 

152 

146 

150 

143.280 

144.754 

144.780 

144.780 


yield results closer to the exact solution of Pagano (1970). The natural frequencies (Hz) 
for a composite sandwich plate using the present FEM analysis (table 2) are also found to 
compare well with those of Khatua & Cheung (1973) and Raville & Ueng (1967). 

The impact response is analysed using both higher-order and first-order shear defor¬ 
mation theories for the target structure (simply supported isotropic plate) of Goldsmith 
(1960). The results are shown in figure 2. The contact force variation, impactor dis¬ 
placement (wi), target point displacement (w) and velocity profile (ly) of impactor are 
plotted. From figure 2 it is observed that the contact force, impactor displacement (uy) 
and velocity profile match well but some discrepancies are observed in the case of tar¬ 
get point displacement response. Figures 3 and 4 show the contact force variation and 
displacement response of target point for a laminated composite plate centrally impacted 
by a spherical steel impactor with an initial velocity of 3m/s. Material properties are as 
those used by Sun & Chen (1985). The results are plotted with those of Sun & Chen 
(1985) and Cairns & Lagace (1989). Here also differences are observed but the nature 
of variation is the same. This difference specially in the case of displacement response 
(figure 4), though not significant, may be attributed due to the variation in the contact 
stiffness. It is also to be noted that only HST11 results are plotted in figures 2-4 because 
such results obtained using different shear deformation theories are very close to each 
other. 


4.2 Material and geometric data for other results 

Numerical results are obtained to study the impact response of laminated sandwich shells. 
The lamina properties used, unless otherwise stated, are as follows: 

T300/934 graphite-epoxy composites (face material): 

E n = 141.2GPa, E 22 = £33 = 9.72GPa, G n = G13 = 5.53GPa, 

G 23 = 3.74GPa, V 12 = 0.30, v 23 = 0.30, p = 1536kg/m 3 , 
a == b = 0.20 m. 



Dipak K Maiti and P K Sinha 
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Figure 2. Impact response of a simply supported square steel plate impacted by a spherical impactor. 




;et point (i 



Figure 4. Target point displacement response of a simply supported square composite plate. 
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Core materials: 

Case 1 

E 2 2f/E U c = ErifjEric — 10, #22c/£33c = 10, 

E 22 f /C\2c — E 2 2f / C?13c = E 22 f/ G 2 3 c — 10, 

Vi2c = 0.35, V23c = 0-035, p = 121.874 kg/m 3 , 

Case 2 

E 22 f/E\\ c = E 22 f I E 22c = 10, E 22c /Ey3c = 10, 
vyic = 0.35, V 23c . = 0.035, p = 121.874kg/m 3 , 

E 22 f/G\2c = E 22 f/ G\3 c = E 22 f/ G 2 3 c = 4,10,20. 

Case 3 

•Ellc = ^22c = G\2c = 0, Vl2c = V23c=0, 

E 2 2fiG\3c = E 22 f/G 2 3c — 10 p — 121.874kg/m 3 . 

Cylindrical shell 

a = 6 = 0.20m, R x /a = oo , R y /a = 5, R xy = oo. 

Spherical shell 

a =b = 0.20 m, = -Ry/a = 5, = oo. 

Impactor (spherical) properties 

Ej = 210 GPa, diameter = 1.27 cm, v; = 0.30, 

Pi =7800 kg/m 3 , vo = 3m/s. 

4.3 Effect of core thickness 

The impact response of clamped free composite sandwich cylindrical shells with different 
core thicknesses is studied and is depicted in figures 5-8. The face sheets are made of 
graphite/epoxy material (1 mm thick) with core (case I) of variable thickness (e.g., 0, 5, 
10 and 15 mm). From figures 5-8 it is observed that due to the increase in core thickness, 
the contact force increases, as there is an increase in structural stiffness. It is also to 
be noted that the number of impact events decreases due to the increase of structural 
stiffness. Impactor displacement and target point displacement response are also affected. 
It is further observed that the increase of core thickness results in the decrease of the 
magnitude of transient displacement. The impactor displacement also increases due to 
increase in structural stiffness. This is because of the fact that, in the case of a stiffer 
structure, the energy transfer from the impactor to the target structure is smaller compared 
to a less stiff structure. 
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Figure 5. Impact response of a centrally impacted cantilever cylindrical composite sandwich shell (t c = 0, case 1 ). 
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Time (Microsecond) 

Figure 8. Impact response of a centrally impacted cantilever cylindrical composite sandwich shell (r c - = 15 mm, case 1) 
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4.4 Effect of radius to span ratio 

The impact response of laminated shell sandwich structures (e.g., cylindrical and spherical 
shells) with different R x /a and R y /a ratios is studied next. The results reduce to those for 
a sandwich plate, when R x = R y — oo. For cantilever cylindrical and spherical sandwich 
shells of 1 mm face thickness and 10 mm core thickness, the impact response is shown in 
figures 7 and 9. It is observed that the patterns of behaviour for different R /a ratios are more 
or less similar. This trend was also reported by Maiti & Sinha (1995) and it can be explained 
in the following manner. The point impact is a localised phenomenon upto a certain initial 
time period and this time period decreases with the increase in structural stiffness. After 
this initial time period, the whole structure starts experiencing the disturbance, although the 
impact is limited to a point, and the form (plate or shell) of the target structure dominates 
in influencing the response. 

4.5 Effects of core materials 

The impact response of the target structure is also studied with different core materials 
(cases 2 and 3). The faces are of 1 mm thick graphite/epoxy composites and the cores are 
of 10 mm thick honeycombs or rigid foams. The target structure is a clamped free doubly 
curved spherical shell. The variation of contact force (F), impactor displacement response 
target point displacement response (to) and velocity profile (u;) of impactor are 
shown in figures 9-12 for different core materials. It is observed that for a very weak core 
(Case 2(3)), the contact force is minimum but the contact duration is almost equal. Further, 
the higher-order shear deformation theories provide higher contact force. The displacement 
and velocity profile of the impactor are also compared for different core materials, and it 
is to be noted that the displacement of impactor (u>,-) and velocity profile (u,-) increase due 
to the increase of core stiffness. Incidentally, all shear deformation theories yield similar 
results for the displacement and velocity profile of the impactor, but variation is observed 
for the target point displacement response and is minimum for comparatively high shear 
modulus. 


5, Conclusion 



In the present investigation, the impact response of laminated sandwich shells is carried 
out using finite element analysis based on. the higher-order and conventional first-order 
shear deformation theories. Computer programs are coded with the help of Fortran-77 
language. The nine-noded Lagrangian isoparametric elements are used to discretise the 
analysis domain. The bending results of a sandwich composite plate compare well with 
those of the exact solution of Pagano (1970). The free vibration results are also found 
to agree well with those of Khatua & Cheung (1973) and Raville & Ueng (1967). The 
variation of contact force, displacement of impactor, displacement of target point and 
velocity profile of impactor are comparable with those of Goldsmith (1960). From the 
analysis of numerical results, it is observed that the contact force increases whereas the 
target point displacement response decreases due to the increase of core depth, as there is 
an increase in structural stiffness. The impact behaviour is also studied for different target 




Time (Microsecond) 

Figure 9, Impact response of a centrally impacted cantilever spherical composite sandwich shell ( t c =■ 10 mm, and case 2(2)) 
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Figure 10. Impact response of a centrally impacted cantilever spherical composite sandwich shell (t c = 10 mm and case 2(1)) 
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Figure 11. Impact response of a centrally impacted cantilever spherical composite sandwich shell ( t c — 10 mm and case 2(3)) 
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Figure 12. Impact response of a centrally impacted cantilever spherical composite sandwich shell ( t c = 10 mm and case 3), 
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structures (plate, cylindrical and spherical shells) and it is shown that point impact is a 
localised phenomenon upto a certain time period, after which the whole structure starts 
experiencing the disturbance. From the analysis of impact behaviour of target structures 
of different core materials, it is noted that the contact force is minimum but the contact 
duration is almost equal for a very weak core (case 2(3)). Also, it is to be remarked that 
not many differences between FST and various HST results are observed in the present 
analysis. However, this may not be the case where the local indentation is enhanced due 
to the localised transverse deformation of the core. 


List of symbols 

a, b 
IB] 

[D] 

E\if, E 22 f, etc. 
£iio E 2 2c, etc. 

Et 

[F] 

F c 

Fm 

[Kel [K] 

L e 

m 

[Mel [M] 

n 

Nt 

Ri 

R s 

'/ 

U, V, W 

uq, vo, wo, etc. 
{««}, {«} 

{«} 

VO 

Wi 

a 

ao 


planform dimension of the shell; 
strain-displacement matrix; 
rigidity matrix; 

modulus of elasticity of face material; 

modulus of elasticity of core material; 

modulus of elasticity of impactor; 

force vector; 

contact force; 

maximum contact force; 

element and global stiffness matrices; 

Lagrange energy function; 
impactor mass; 

element and global mass matrix; 
contact stiffness; 
shape function of node i ; 
radius of impactor; 
radius of target structure; 
thickness of face sheet; 
thickness of core; 

displacements along x, y, z directions respectively; 
degrees of freedom; 

element and global displacement vector; 
element and global acceleration vector; 
initial velocity of impactor; 
acceleration of impactor; 
relative indentation; 
permanent indentation; 
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OLm 


&cr 

P 

v 
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maximum relative indentation; 
critical indentation; 
mass density of the material; 
natural coordinates; 
engineering strains; 
generalised strains. 


Appendix A. Generalised strain-displacement relations 


e° 1 


duo/dx + wq/R x 

*yy 


dvo/dy + wo/Ry 



e z 

y° 

'XV 


duo/dy + dvo/dx + Wo/Rxy 

y° 

'yz 


dy + 3Ulo/3>’ 

y° 

'XZ 


6 X + dwo/dx 


kxx 


d6 x /dx + 6 Z /R X 

kyy 


ddy/dy + 6 Z /R y 

kzz 


2wo 

kxy 


d9 x /dy + ddy/dx + 6 Z /R xy 

kyz 


2vq + d9 z /dy 

kxz 


2uq + d9 z /dx 


^XX 


duo/dx + wo/R x 

yy 


dvo/dy + wo/Ry 

*zz 


3 e z 

y° 

rxy 


diio/dy + dvo/dx + wo/R X y 

y° 

' yz 


3 0 y + dwo/dy 

y0 

'XZ 


3 9 X + dwo/dx 


k xx 


dO x /dx + 0 Z /R X 

kyy 


d0 y /dy + @z/Ry 

kxy 

• = ■ 

9 9x/dy + dO y /dx + 9 Z /R xy 

kyz 


dOj/dy 

kxz 


36 z /dx 


(Al) 
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Appendix B. Strain displacement matrix [B] 


[®]23xl2 = 


[ fi ll]l2x6 

[ fi 2l]llx6 


[ fi 12]l2x6 

[ fi 22lllx6 


where 


Ni, x 

0 

Nt/R x 

0 

Ni, y 

A h/Ry 

0 

0 

0 

Ni,y 

N itX 

Ni/Rxy 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 

Ni 

0 

0 

0 

Ni, x 

0 

Ni/R x 

0 

Ni,y 

Ni/Ry 

0 

0 

0 

Ni,y 

^i,x 

Ni/Rxy 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


IBn] = 


[ fi 2l] = 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Ni, x 

0 

0 

Ni,y 

0 

0 

. N i.y 

Ni, x 

~ 0 0 

0 

0 0 

0 

0 0 

0 

0 0 

Ni,y 

0 0 

Ni,y 

0 0 

0 

0 0 

0 

0 0 

0 

0 0 

0 

0 0 

0 

0 0 

0 


0 

0 

0 

0 

0 

0 

2Nj 

0 

Ni/R x 

Ni/Ry 

0 

Ni/R xy 

0 0 
0 0 
0 0 
0 Ni 
Ni 0 
0 0 
0 0 
0 0 
0 0 
0 0 
0 0 


0 0 0 

0 0 0 

0 0 0 

0 0 0 

ooo 
ooo 
ooo 
ooo 
ooo 
ooo 

0 0 3 Ni 

ooo 


0 

0 

0 

0 

0 

Ni, y 

Ni, x 

0 

0 

0 
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0 0 0 N Ux 0 Ni/R x 

0 0 0 0 N it y Ni/Ry 

0 0 0 ,N U y N i<x Ni/R xy 

0 0 0 0 0 0 

ooooo 0 

[^22] = 0 2N{ 0 0 0 0 

2Nj 0 0 0 0 0 

0 0 N it y 0 3 Ni 0 

0 0 Ni, x 3 Ni 0 0 

0 0 0 0 0 N t ,y 

0 0 0 0 0 Ni\ x 

with i = 1, 2, 3,... 9. 
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Adaptive finite element analysis with quadrilateral elements 
using a new ^-refinement strategy 
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Madras 600 036, India 

e-mail: moorthy@civil.iitm.emet.in 

Abstract. The theory and mathematical bases of a-posteriori error estimates 
are explained. It is shown that the Medial Axis of a body can be used to de¬ 
compose it into a set of mutually non-overlapping quadrilateral and triangular 
primitives. A mesh generation scheme used to generate quadrilaterals inside 
these primitives is also presented together with its relevant implementation as¬ 
pects. A new /i-refinement strategy based on weighted average energy norm 
and enhanced by strain energy density ratios is proposed and two typical prob¬ 
lems are solved to demonstrate its efficiency over the conventional refinement 
strategy in the relative improvement of global asymptotic convergence. 

Keywords. Adaptive finite element analysis; quadrilateral elements; h- 
refinement strategy. 

1. Introduction 

The reliability and accuracy of finite element analysis (FEA) has always been a point 
of contention, especially in applications where the precision of the solution is critical to 
the analyst. In general, it can be stated that although FEA is the most widely used tool 
for the solution of a large class of engineering problems characterized by PDEs (Partial 
Differential Equations), the accuracy of the solution may always be questioned. From a 
global point of view, this inaccuracy may be attributed to the modelling drawbacks of FEA 
since it is practically impossible to characterize the infinite number of degrees of freedom 
of a real physical system by a discrete numerical model. This modelling deficiency usually 
results in a lower bound of the solution which is manifested by stiffening in structural 
mechanics problems. 

In most real life engineering analysis problems, classical solutions are almost never 
available since the problem domain is usually non-regular. In such cases, the FE solution 
is the only benchmark that can be taken as a reference and thus it is imperative that it is 
reliable and accurate. 
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In this aspect, one more feature of FEA should also be addressed - viz. automation of 
the procedure. The reason for this is two-fold: first, the automation of the FE procedure 
implies a lesser amount of man-machine interaction which reduces the chance of human 
errors and, second, as will be shown later, error estimation procedures perform best in 
automatic unstructured mesh generating environments - so, in a way automation enhances 
the reliability of the error estimation procedure. To address the problems of accuracy 
estimates and reliability of the FE solution, a closer insight to typical FE errors is thus 
required. A study of the derivation of the error estimates using variational bases is also 
necessary to gain further insight into the relationship these estimates share with the standard 
FE process. 


2. Errors in FEA 

FE solution errors may be broadly classified into three major groups depending upon 
their source of generation. It may be stated that the various sources of errors are inherent 
in the modelling of the continuum problem into a discrete set of equations, rounding 
off and truncation due to limited representation and operation of floating point vari¬ 
ables in the computer itself and in the overstiffening effect of the structural system 
in general. From these viewpoints, FE solution errors may be classified as given 
below 

• Mathematical modelling errors 

These occur due to the fact that no mathematical model can fully satisfy all the charac¬ 
teristics of a physical model. Thus, such errors are introduced at the very onset of the 
formulation of the PDEs. 

• Discretization errors 

In a discretized FE model there are a finite number of degrees of freedom which are 
used to model a continuum system which has practically an infinite number of de¬ 
grees of freedom. This overstiffens the system and produces discretization errors. In 
most cases (i.e. in smooth solution fields in analytic domains) it can be shown that the 
solution accuracy improves asymptotically with increase in the number of degrees of 
freedom. 

• Roundoff errors 

Since the computer handles variables usually through a finite number of words — a 
significant number of digits are rounded off to the next largest digit thus chang in g the 
values of the variables. 

In adaptive FEA, the discretization errors are minimized using suitable error estimates. 

2.1 Error analysis and estimation 

In this section a typical elliptic PDE as used by Kelly et al (1983) and Gago et al (1983) is 
considered and the Galerkin method is used to discretize the weak form of the equations as 
it is done in traditional FEA. Subsequently, it is shown that a weighted residual expression 
of the strong form of the PDE can be reformulated on the discretized domain which 
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gives rise to certain residuals that constitute the FE discretization errors. To arrive at the 
same results as Kelly et al (1983) and Gago et al (1983), a detailed derivation of the 
equations is presented below for the benefit of readers not conversant with functional 
analysis. 

Let us consider a domain denoted by £2 bounded by T# and T n such that T# p| T jv = 0 
and r,D (J rV = T. Let the domain £2 be necessarily non-singular. Let a partial differential 
equation be defined in £2 as, 

—V r [aV«] + bu + / = 0, (1) 

where, a is an unknown function and b and / may be constants or functions depending 
upon the nature of the problem to be modelled. £2 C 91 3 (in the most general, i.e. 3-D 
case). The boundary conditions are given by the following equations, 


u — u 


( 2 ) 


which are the geometric boundary conditions defined on T#. For the current problem, 
homogeneous Dirichlet boundary conditions are used for which u = 0. The natural (Von 
Neumann) boundary conditions are given as: 


3 u 

a M =q 


( 3 ) 


which are defined on Tjy, n being the unit normal vector drawn on the boundary away 
from the domain. 

The weighted residual form of (1) may be written as, 


— f vV T [aV«] d£2 + f buv d£2+ / u/d£2 = 0, 
Jn Jn * J n 


( 4 ) 


where v is a weighting function. On using the Gaussian divergence theorem on (1), the 
following equation is obtained, 

- [ vV T [aVu]dn= [ [Vu][aV r K]d£2 - f vaVu udT. (5) 

jq Js 2 Jr 

Assuming that the weighting functions v actually have square integrable first deriva¬ 
tives and obey the homogeneous Dirichlet boundary conditions, (5), (2) and (3) 
yield, 

— f uV :r n[aVM]d£2 = f [Vv][aV r «]d£2 — f i^dT. 

Jq Jsi Jt'n 

From (6), the weak form of the problem is obtained as, 

[ [tfV r M][Vv]d£2 + f fv d£2 + f buv d£2 - [ vqdT = 0. 

Jqu Jq Jq J r# 

In this context, the concept of a bilinear form is presented as, 

B(u, v) = I (L\uM\v + L 2 UM 2 V 4-.. .)d£2, 

Jn 


( 6 ) 


( 7 ) 


(8) 
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where u, v are functions in the same normed vector space and L; and Af/ are non-zero 
linear operators on u and v. Using this relation in (7), we obtain, 

B(u,v) = [ [aV r n][Vv]d^ + [ buvdtl = - f fvdQ + [ vqdT. ( 9 ) 
Jn Jn Ja Jr N 

As stated previously, v belongs to a space of functions which have continuous square 
integrable first derivatives and satisfy the homogeneous Dirichlet boundary conditions. 
Let this functional space be called Hr>, and let a family of functions be A/ e Ho- The 
functions u are approximated as, 

M 

u = J2 N i*i- ( 10 ) 

i =1 

Thus, B(u, u) = B(u, N t ) 

B(u, Ni) = f [aV T u][VNi]dtt+ f bNiudQ.. ( 11 ) 

Jq Jn 

Now, the functions Ni are piecewise continuous functions over subdomains Qi where 
U Each of these subdomains Qi is bounded by the boundary P/. The continuity 

of the functions it are of the same order as of the weighting functions Nj (Bubnov— 
Galerkin approach), which ensures that these two function families indeed belong to Ho - 
The only difference is that the previous function space was defined in Ho (£2) while the 
current space is defined in Ho(Qi). Hence, the discretized equation may be represented 
as: 


B(u,Nj) = J2 [ [aV T Nj][Vu]dSl+f2 f bNjudQ. (12) 

One important point to be noticed about this step is that discretization introduces some ar¬ 
tificial boundaries in the system at which none of the boundary conditions are valid. Thus, 
at these zones some perturbations may arise if we revert back to the weighted residual 
form since the PDE is now redefined on a different domain (i.e. connected set of discrete 
subdomains). To demonstrate this, the first integral of (12) is integrated by parts by using 
the Gaussian divergence theorem as, 


/ V T Nj[aVu]dQ = - f NjV T [aVu] dtt+f \a— 1 NidQ. 

J&i Jsii Jr P cT L 9nJi> 

+ [ J (a^) A/dr, (13) 

h K <tr V Z*)r K J 

where, J \a (3 uj 3n)]p^. is the “jump” or discontinuity in fluxes across the element interface. 
This occurrence is solely due to the approximation made in the function u. Although only 
first order continuous functional approximation of u needs to be done to satisfy the weak 
form of the equation, the first derivatives of these functions are not continuous across ele¬ 
ment boundaries which manifests itself in the PDE in the strong form. Thus, using (13) and 
(4), and rearranging the terms, the following equation is obtained as a complete weighted 
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residual formulation of the problem using the approximate functions on the discretized 
topology of the domain Q. 



From (14), which is also presented in Kelly et al (1983) and Gago et al (1983), it is evident 
that a new weighted residual (WR) form may be created when approximate functions for 
u are used on the discretized domain £2 Z -, which creates some non-zero “residual” or error 
terms as will be shown in the next section. These are the discretization errors which are 
introduced in the FE formulation. From the new WR form in (14), let the substitute prob¬ 
lem statement in the PDE form (strong form) be constructed. There are practically three 
groups of integrals to be dealt with in this problem, i.e. those on the domain Q, those on 
the interface boundaries T p which are not a part of the boundary T (and thus necessarily 
a part of £1) and the Von-Neumann boundaries F#. In fact, (14) gives the terms arising 
out of the violation of the (domain) equilibrium and the natural boundary conditions and 
associates them to a spurious “jump” of traction values across the discretization interfaces 
arising due to the lower order approximation of the function u by u. The domain integrand 
may be written as, 

V T (aVu) + bu +f = R (15) 

where R ^ 0, since in general u ^ u. The terms on the Von Neumann boundary may be 
written as 

q~ a ^- = F, (16) 

dn 

where F is a non-zero vector. The term on the interface boundary is given by: 



where B is also a non-zero vector quantity. Thus, it can be remarked that R is the term 
arising out of the approximation of u inside the domain, F indicates the violation of 
the natural boundary conditions due to the approximation of the function u on the Von- 
Neumann boundaries and B is the measure of discontinuity of the first derivatives of 
u on the discretized boundaries arising due to the discretization of the domain and the 
functional approximation of u. From the above equations it is also clear that, in order to 
evaluate the error of discretization, all the three terms need to be considered for study. 
In structural mechanics applications the domain term indicates the violation of internal 
element equilibrium, the Von Neumann boundary terms indicate the errors in load mod¬ 
elling and the element interface integral gives the jump in stresses across the element 
boundary. 
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2.2 Substitute variational problem in error 


As seen in the previous section, discretization of the domain Q and functional approxima¬ 
tion of u are responsible for perturbations of the basic PDE, the natural boundary conditions 
and the first-order derivative discontinuities of the function on discrete boundaries. The se¬ 
lection of the approximate function u is guided by the requirements of the weak form of the 
functional which is different from the actual function u whose continuity requirements are 
higher since it satisfies the strong form of the functional. However, the approximate func¬ 
tion is always a subset of the original functional space. Following Kelly et al (1983), the 
bilinear form of the error from the equations derived earlier is established, as given below. 
Let an error function e be introduced such that 

e — u — u. (18) 

If the function e is introduced in the PDE (1) and the boundary conditions given by (2) 
and (3), then the following relations are obtained: 

-V r (aVw) + fcu + / = R. (19) 

Hence, 

-V T (aS7e) + be + R = 0. (20) 

The Dirichlet boundary conditions on become 

u — u. (21) 


The Von Neumann boundary condition is given on by the following equation. 


de du 

a— — q — a —. 

3n 3n 


( 22 ) 


Thus, the above set of equations pose a strong form of the substitute problem in e . Using the 
method of weighted residuals as done previously in an attempt to decompose the problem 
to its weak form (using the same basis functions v e Hp which obey the homogeneous 
geometric boundary conditions), the following equations are obtained: 


/ 

Jn 


■ / bevdQ + / 
Jq Jq 


V 1 (aVe)ud£2 + / bev&Q+ / vR&Q = 0. 
Q 


(23) 


Using the Gaussian divergence theorem on the first integral of the above equation we get, 


- [ V T (aVe)v d£2 = [ Vu(aV r e)dfi - f v (a—) d T. (24) 

Jq Jr V 3n/ 

Since v satisfies the homogeneous Dirichlet boundary conditions, the above equation 
decomposes to the following: 


- f V r (aVe)vd£2= f Vv(aV T e)d£2 - f v(a— W. (25) 

Jn Jq Jr N V 3n/ 


Using the substitute Von Neumann equation given previously, the above equation may be 
modified as 


[ V r (uVe)vdS2= [ Vv(aV T e)dCl-f v (q-a— W. 
Jq Jq Jr N V 3n/ 


( 26 ) 
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Thus, replacing this condition into the weighted residual equation (23), the following 
equation is obtained, 

[ Vv(aV T e)dtt- f + [ bevdQ + f vRdtt = 0. 

Jo, Jr N \ onj Js 2 Jn 

(27) 


From (9), the bilinear form B(e, v) may be given as, 

B(e,v)= [ Vu(aV r e)dS2+ [ bevdtl. (28) 

J Q J £2 

In structural mechanics problems, the bilinear form B(u, v) in (9) represents some measure 
of the internal energy (strain energy) of the body, whereas the right hand side indicates 
the external work done by the applied loads. In virtual work type formulations, v is the 
vector of virtual displacements, and the bilinear form indicates the internal virtual work 
done by the body as a response to the external virtual work done by the loads. In (28), 
therefore, B(e, v) indicates some measure in the error in internal energy of the body and 
v are general Galerkin weighting functions. The external perturbation given (on the non- 
discretized domain) is due to R (equilibrium error on domain due to modeled u ) and F 
(violated natural boundary conditions). 

As shown in § 2.1, let the domain £2 be discretized into subdomains (finite elements) £2 ( - 
and thus introduce several discretized boundaries (interelement boundaries) distinct 
from the Dirichlet and the Von Neumann boundaries, i.e. 

£H- (29) 

Using (29), and the discrete weighting function vj where j indicates the index indicating 
the degree of freedom (discrete), (28) is recast as 


B(e, vj) 


=±[L, Vvj (aV T e^ d£2,- + J bevjdCli 


(30) 


Using the Gaussian divergence theorem to integrate the first domain integral in (30), the 
following relation is obtained. 



Vvjia'V 7 e)d£li 



+ 


E i 

r P er N Jr P 


Vj a 


de 

9n 


dr. 


(31) 


Substituting (31) in (30), we obtain: 
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From the above equations it can be seen that the term / r J [a(de / dv)]vjdF, actually 
decomposes to / r J[a(3M/3n)]u v dr, since the exact solution u does not have discon¬ 
tinuous first-order derivatives across the interelement boundary. Also, from the original 
partial differential equation of the system and the modelled equation it is obvious that the 
following relation holds, 

f V J '(aV«)i)jdQ; + f bevjdQi = [ V r (aV(w - u))vjdQi 
+ / b(u — u)vjd£li, (33) 

J&i 


since, 


(34) 


/ V r (aVn) VjdSli + f buvjd&i = — / fvidQ. 

Jtot Jtoi 

Substituting (33) and (34) in (32), 

yi /» n /• n /* 

B(e, v) = J2 V T (aVn) vjdQt - / Mu/dOi - £ / 

+ E ( 7 ( a S)^ dr+ £ f ^-4 dr - (35) 

Jr x V 3n/ rf^.,Jr P 3n 


r>€r w •' I > 


On grouping terms under the domain and boundary integrals and replacing the domain 
residual term by we get, 


5(e, v) = — f RiVjdQ 


+ y; f j(a~) Vj dr+ V [ (a —Wdr (36) 
TkZTn Vk ^ dn ' r^rJrA 9nJ J 


r P cr n Jv p 


Due to orthogonality relations, the bilinear form Z?(e, u) is zero. Thus, (36), which has 
also been derived by Kelly et al (1983) and Gago et al (1983), indicates that the error in 
displacements e is such that a homogeneous form of (36) is satisfied for the whole domain. 
Since (36) becomes an identity, it cannot be solved directly and several assumptions are 
made to evaluate e. For example, as reported by Kelly (1984) and Kelly & Isles (1989) - 
it is assumed that the domain residual (indicated by R ) and the interelement traction jump 
actually self equilibrate over an element and the natural boundary condition violations 
are treated as traction jumps at F^. In fact, (36) represents an enhanced FE equation, in 
which the domain term indicates the residual internal energy, the term on Fk indicates the 
work done due to unbalanced internal forces and the term on the Von Neumann boundary 
r N indicates the work done due to residual forces on loaded edges. This implies that 
the global sum of the residual forces actually yields a measure of the equilibrium error 
due to discretization. Thus, the standard FE energy, if enhanced by this residual energy, 
may yield better results. This concept is used by Cantin et al (1978) and Cook (1982) to 
estimate a better stress distribution from a given set of FE results by iteratively improving 
displacements using the residual loads. 
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(a) 




Figure 1. (a) - Node patch, (b) Ele¬ 
ment patch. A, B, C, D, E, F = cen- 
troidal superconvergent points of the 
elements, p = patch assembly node, 
@ = patch assembly element. 


2.3 Error estimation procedures 

Several a posteriori type error estimates have been reported in literature. Among the most 
notable of these are the ones reported by Babuska & Szabo (1982), where the residual form 
of the estimate is considered. Zienkiewicz & Zhu (1987) designed a best guessed stress type 
error estimate based on least square smoothing of the stresses. More recently, Zienkiewicz 
& Zhu (1992) reported a superconvergent error estimate based on patchwise stress recovery. 
The method was enhanced by Wiberg and coworkers (Wiberg & Abdulwahab 1993; Wiberg 
& Li 1994; Wiberg et al 1994) and Blacker & Belytschko (1994) who used equilibrium 
and natural boundary condition residuals together with conjoint polynomials to derive an 
asymptotically exact estimate. 

The development of such a posteriori error estimates is focussed on two aspects. First, a 
smoothed stress distribution needs to be extracted from the FE stresses and next, a proper 
refinement criteria needs to be designed which determines the new element size for the 
given error percentage. The errors are usually computed as a measure of the difference 
between the FE stresses and the smoothed stresses. 

In the patchwise stress recovery method, a patch-of elements is selected around the node 
as shown in figure la. The unknown smoothed stress variation over this patch is assumed 
to be 


< 7 * = [1 ,x,y,xy]{a], 


(37) 
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where [a] is a vector of undetermined coefficients. The discrete L 2 norm of the stress 
difference is considered to be the stress error functional, n a : 

NP 

n a = X>* - cr hi f[o* - a hi ], (38) 

i=0 

where, NP — number of superconvergent sampling points on patch, a hi = superconver- 
gent FE stress on ith samphng point. 

Equation (38) is valid only in the shaded area in the patch as shown in figure la. It 
may be noted that this equation is a least square estimation of the smoothed stresses from 
the superconvergent FE stresses crhi ■ On differentiating this functional with respect to the 
undetermined coefficients, we get the following set of linear equations, 

NP NP 

EttPifraM = E([Pif cr w ), (39) 

i=0 i =0 

where, 


[Pi] = [1, x,-, y;, r,>';]• 


Now, let us consider (36) again. Let the error in displacement e be replaced by 
e — u — u*. 


(40) 


where u is the exact displacement and u* is the displacement corresponding to the smoothed 
solution. Thus, the right hand side of (36) implies that the smoothed solution does not 
produce any equilibrium residual on the domain or any interelement residual on FE edges 
or any boundary residual on the Von Neumann boundaries. These conditions are thus used 
as constraint equations to enhance the functional n CT , (38), as 


NP 

n<r = Y / [cr*-v h i?l<r*-crhi] + Pl f [W* - ff[N T o* - f\d£l p 

i =0 Ja p 


+ 


h [ 

Jr, 


[[IVKcr*) - f] r [(V](a*) - F]dr p , 


(41) 


where, /?i, /% = penalty coefficients, NP = number of superconvergent sampling points 
on patch, [V r cr* - /] = equilibrium residual, Q p = patch domain, [V](cr*) — t = Von 
Neumann residual, and r p = Von Neumann boundary on patch. 

The term on the interelement boundary is omitted because the smoothed stress poly¬ 
nomial is continuous over the patch. Equation (41) was also presented by Wiberg et al 
(1994) and Blacker & Belytschko (1994) but no strict justification was given regarding the 
enhancement of the basic stress functional by the equilibrium and the natural boundary 
condition enhancements. Thus, (36) represents the basic relation which is used to extract 
smoothed stresses in most of the published a posteriori error estimators either in direct 
form or as a constraint condition to enhance a least square stress functional given in (38). 

The use of the augmented patch-based stress extraction methods has some mathemat¬ 
ical inconsistencies in setting up of the limits of integration of the equilibrium and the 
Von Neumann residuals. It is seen that only a part of the patch is influenced by the 
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least square polynomial (indicated by hatchmarks in figure la) which implies that the 
sampled values of a* in zones exterior to this part cease to be reliable. This implies 
that the integration limits of the equilibrium and the Von Neumann residuals cannot be 
over the entire patch. Recently, Wiberg et al (1994, 1995) proposed the element patch 
(figure lb) where such problems do not occur as the integration limits cover the entire 
element. However, as it is difficult to compute a least square projection of the stresses 
over the element patch, Wiberg et al (1994, 1995) use a least square displacement pro¬ 
jection technique from which the stresses are computed by using the strain displacement 
and constitutive laws. However, as the QUAD4 element does not possess superconvergent 
displacement points, it is not practicable to extract superconvergent stresses from an en¬ 
hanced displacement field directly although the enhanced displacement field is constrained 
by (36). 

In this context, Mukherjee & Krishnamoorthy (1996) have presented the element patch- 
based superconvergent error estimate which uses a least square fit of an enhanced stress 
polynomial and the penalty constraints of (36) are applied directly. Unlike Wiberg et al 
(1994, 1995), no displacement projection is done and thus the superconvergent nature of 
the stresses are guaranteed. The examples in this paper are solved by using this estimate. 


3. Automatic mesh generation procedures 

A major part of the effort in the adaptive FE process lies in the mesh generation procedure. 
Good reviews of mesh generation schemes may be found in Buell & Bush (1973), Thacker 
(1980), and Ho-Le (1988). The available mesh generators now in use may be generally 
classified into two groups, i.e. mapped and automatic mesh generators. In the mapped 
mesh generation process the problem domain is usually manually decomposed into a set 
of mappable regions which are mutually non-intersecting. A mapping technique, usually 
an isoparametric or a transfinite procedure, is employed to explicitly or implicitly handle 
a set of geometric representations within each mapped region. These representations are 
defined in terms of the information specified on the boundaries of the subregion. More 
specifically, the isoparametric scheme is used to interpolate points in the subregion domain, 
while collocating at discrete points on the subregion boundary, and the transfinite mapping 
method interpolates points in the subregion domain, while collocating globally on the 
subregion boundaries. Thus, creation of transitions is impossible unless special measures 
are employed and, although these methods are fast, they are not flexible enough for local 
control. 

Automatic or unstructured mesh generators are generally boundary based, i.e. the bound¬ 
ary definition of the meshable object is taken as the starting point of the mesh generator, and 
as the generation procedure progresses, the meshable domain geometry also changes con¬ 
tinually. Thus, at every step of element generation, the geometry of the unmeshed domain 
needs to be evaluated. Hence, even though these processes have, better mesh control and 
are more flexible, they are computationally more intensive. Also, the storage requirements 
for the unstructured mesh generators are larger as both connectivity and coordinates of the 
nodes need to be stored, whereas in case of mapped mesh generators only the coordinates 
need to be stored. 
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Thus, it was commented in Krishnamoorthy et al (1995) that the motivation for the 
development of a new mesh generation is to design a system which incorporates the com¬ 
putational efficiency of the mapping techniques and the flexibility and control characteris¬ 
tics of unstructured mesh generators. Keeping this in mind, a new method of quadrilateral 
mesh generation was proposed (Krishnamoorthy et al 1995), called Meshing by Successive 
Superelement Decomposition (MSD), which was shown to be composed of two parts — 
the Approximate Skeletal Method which automatically decomposes the problem domain 
into a set of mappable, topologically simple superelements and the Meshing by Successive 
Decomposition, which is a recursive quadrilateral mesh generation scheme acting on the 
individual superelements. , 

3.1 Approximate skeletal method 

The theoretical basis of the approximate skeletal method is the generation of medial axes 
of objects, which is used for object recognition in pattern recognition theory. The imple¬ 
mentation details of this technique are presented by Krishnamoorthy et al (1995). The 
theoretical basis of the method is briefly discussed in the following sections. 


3.1a Medial axis transforms: In pattern recognition theory, a skeleton or a medial axis 
or a symmetric axis of an object is defined as the locus of those points which are mini¬ 
mally equidistant from any two boundary points of the object - in general the method for 
generating the skeleton is usually referred to as the Medial Axis Transform (MAT) tech¬ 
nique or the Symmetric Axis Transform (SAT) technique. The existence of skeletons for 
various biological shapes and their use for shape description was first proposed by Blum 
(1967). Shapes are normally described by their boundaries - however, in MAT, the shape 
description of objects include the interior (and exterior) of the object by defining a primi¬ 
tive called a maximal disk. Hence the description of an object consists of two primitives — 
viz. the medial axis (MA) and the maximal disk (MD). The locus of the centre of the MD 
is the MA itself and the radii of the MD form an envelope which describe the boundary of 
the object. 

The flexibility and the generality of shape recognition by MAT cannot be overstressed. 
For example, the shape features which are shown in figure 2 are identified by simple local 
perturbations in the MA or the MD. In the case of a pinch, the “noise” in the MA proves its 
presence. The worm, wedge, cup and flare are all characterized by local curvature values 
of the radius function of the MD. In specific cases, like the worm, the curvature change is 
zero, in all other cases it is non-trivial. Thus, by using MA and MD, not only the boundary 
features but also the width properties of the object may be identified. 

Mathematically, in MAT, an intrinsic coordinate system is used to define any two- 
dimensional object. Given a closed boundary A of a domain £2, the Euclidean distance 
d(x. A) from any point x to a set of boundary points A is given by 

d(x, A) = min[d(x, y) : y e A], (42) 

It is clear that for some points, more than one boundary point satisfies this minimal distance 
criteria and the locus of such points is the MA of the system. Let this MA be designated 



Adaptive finite element analysis with quadrilateral elements 


635 



KINK 



CUP FLARE 

Figure 2. Some elementary shape descriptors based on width properties and their MA. 


5; then a function f(x) can be defined such that it maps A into a set of non-negative real 
numbers p e R, where R is the space of all non-negative real numbers as 

/(*) = d(x, A). (43) 

From (42) and (43) as given above, it is clear that f(x) is the radius function or the disk 
function of the domain. The value of this function for any x on S gives the measure of the 
radius of the MD at*. 

It can be shown that the MA of a shape and the Voronoi diagram of its edges are 
interrelated. In fact, they are identical for convex domains - however for non-convex 
domains, the Voronoi diagram is different. 

Geometrically, the MA is composed of several connected segments joined at a set of 
points called the skeleton nodes which are also called branch points (Blum 1967). Hence, 
the MA of a 2D object is a one-dimensional planar graph without any area. The MAT 
of objects can exist even outside the domain as shown in figure 2. If the domain itself is 
considered as a hole in a veiy large bounded circle, then many properties of the internal 
MAT are also seen as that of the external MAT. However, external MATs have two distinct 
properties of their own, viz. for convex shapes external MATs do not usually exist and, 
even if they do, they are not connected. 

Interior MA can be used in pattern recognition as shown by Blum & Nagel (1973) 
and automatic mesh generation, as reported by Gursoy & Patrikalakis (1992) and Tam & 
Armstrong (1991), while exterior MA is used for motion planning and mesh generation 
for CFD applications. Figure 3 shows the interior and the exterior skeletons of an arbitrary 
domain. 

The analysis of MATs of continuous shapes was investigated by Blum (1973), Calabi 
& Hartnett (1968a, 1968b), and Nagel & Blum (1976). Computer implementation and 
discrete MA theory was developed by Montanari (1968). Besides these, the works of Lee 
(1982) and Bookstein (1979) are also noteworthy. 






636 


C 5 Krishnamoorthy and S Mukherjee 



Figure 3. Outer and inner MAT of a domain. O = outer medial axis, I = inner 
medial axis. B = boundary. 

3.1b Domain decomposition using MAT: While addressing the issue of domain de¬ 
composition, the methods used for generating MA branches should be highlighted. In the 
present case, Krishnamoorthy et al (1995) presented the equations of the MA in simple 
parametric form since the domain boundaries could be represented by analytic equations. 
In pattern recognition theory, thinning algorithms and in CAD systems geometric search 
techniques are usually used for generating the MA branches (Turkiyyah & Fenves 1988). 
In case of FE mesh generation applications, domain feature extraction is not carried from 
the MA and hence a mathematically accurate MA extraction is not necessary. This is the 
basis for the method presented in Krishnamoorthy et al (1995), where simplified repre¬ 
sentations of boundary and the MA branches ensure a large computational saving in the 
domain decomposition process. 

The assumptions introduced in Krishnamoorthy et al (1995) produce no major perturba¬ 
tion of the MA. All curved boundary segments are represented as a union of fine segments. 
This simplifies computations since the MAT of straight edges are only first- or second- 
degree curves. In fact, if boundary representations are handled by quadratic polynomials, 
the MA becomes a quartic polynomial. However, in the proposed algorithm, the MA is 
handled by piecewise continuous quadratic polynomials, consistent with the simplified 
boundary representations. 

The MA branches (also referred to as skeletal curves or radial lines), which are rays 
traced from skeleton nodes to nearest boundary segments, and the boundary segments 
themselves decompose the domain into a set of non-intersecting, topologically simple 
superelements which are considered individually for mesh generation. The radial lines 
indicate the radii of the MD centred at the skeleton nodes. 

The superelement generation process is thus composed of four steps as shown below. 

• Generation of equidistant curve. 

• Generation of skeletal curve segments. 
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Figure 4. Equidistant and skeletal curves. 

• Ray tracing from skeleton node to generate radial lines. 

• Geometric merging processes to correct distorted superelements. 


3. lb(i) Generation of equidistant curve - An equidistant curve for a pair of line segments 
is defined as the locus of all the points which are equally distant from these segments. In 
figure 4, the segments El and E2 are used to generate the equidistant curve RW which 
consists of five piecewise continuous curve segments RS, ST, TU, UV and VW. The 
discontinuities of the curve are marked by the perpendiculars PI, P2, P3 and P4. The 
equidistant curve is a piecewise continuous quadratic polynomial in these five segments. 

The coordinate systems and the computation procedures for the points on the equidistant 
curve may be found in Krishnamoorthy et ai (1995). 


3. lb(ii) Generation of skeletal curves - In the present work, the boundary primitives are 
straight lines, a chain of straight line segments representing a curved edge and reentrant 
vertices. The skeletal curve is a subset of the equidistant curve which is constrained by the 
interference of a third boundary primitive in accordance with (43). Thus, the skeletal curve 
segment is basically a branch of the MA and the union of these segments gives the MA 
of the whole domain. In figure 4, Q is a typical skeleton node generated by edge segment 
E3 and the segment SQ is a MA branch (skeletal curve segment) of the doublet defined by 
(El, E2). 

Thus, the MA is composed of several such individual segments which are bounded by 
these skeleton nodes and, since each is independent of the effect of another, they are unique, 
disjoint and complete. Since a unique pair of boundary segments is used to generate each 
segment, such doublets are also unique, disjoint and complete. 


3. lb(iii) Subregion decomposition - The decomposition of an object into meshable sub- 
regions is preceded by the generation of the shape primitives, which are derived from the 
medial axis and are discussed in the next section. 
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3. lb(iv) Shape primitives - In this context, the concept of a shape primitive is introduced 
next, as proposed in Krishnamoorthy et al (1995). Let £2 be a bounded planar domain with 
a boundary A, then following earlier definitions, d(x , y) is the Euclidian distance function 
from a point x e S to a point x e A where S is the MA. Thus d(x, y) is also the 
radius function of the MD at x. So, if a normalized coordinate system t is used for each 
MA branch and boundary segment, then D(t ) becomes the normalized Euclidian distance 
function and indicates the lengths of the perpendiculars from the boundary segments onto 
the MA branch. At the skeleton node, there are three such equal perpendiculars from the 
two generating and one constraining boundary edge. 

As the MA is defined as the locus of the centres of such maximal disks which are 
tangential to the boundary segments, the distance function may be defined as the locus of 
the point of tangency of the MD to the boundary. In this context, let us introduce one more 
constraint in the definition of the MA; allowing the MD to move inside £2 such that the 
bounding box to the MD has at least two opposite edges lying on the boundary A. It may 
be observed that the locus of the centre of MD, S, is only a subset of the MA, S. This new 
shape attribute of the domain £2 is now called the shape primitive (SP) of the domain. In 
geometrical terms, it is clear that the MA is a list of branches bounded by skeleton nodes 
and boundary nodes, whereas the SP is a list of branches bounded by skeleton nodes only. 
It may be noted that the algorithm for the generation of the SP differs markedly from the 
grassfire algorithm as proposed by Patrikalakis & Gursoy (1990). 

Previously, it was stated that the MA is a subset of the edge Voronoi diagram, and now 
the SP is presented as a subset of the MA. 

Recently, Reddy & Turkiyyah (1995) presented the trimmed skeleton which is identical 
to the SP as proposed earlier by the authors. 


3.1c Skeleton node classification and ray tracing methods: The branches of the shape 
primitives are identified by the type of skeleton nodes they are bounded by. The skeleton 
nodes are, in turn, identified by the number and configuration of the radial fines that can 
be traced from them to the object boundary A. In general, the rays from the skeleton node 
to the object boundary segments depend upon the relative positions of these segments in 
the 2D plane. For example, if three boundary edges are adjacent to each other, then the SP 
branch that is generated is itself a skeleton node. This is the typical branch node in MA 
where the MD touches A at three places, thus in the present case, three radial lines can be 
drawn from such nodes, hence such nodes are named triple ray type nodes. Similarly, in 
cases where at least two edge segments are adjacent or all three edges are non adjacent, 
triple ray type nodes are generated. In cases where the two edges are nonadjacent and 
are the generators of an SP arc, usually double ray type nodes are introduced which are 
basically the normal nodes in MA where the MD touches A at two points. It is to be noted 
that the double ray type node contradicts the definition of the skeleton node, but in case of 
domain decomposition applications, where the quality of the generated superelements is 
important, such double ray type nodes need to be introduced in places where the generating 
edge segments change curvature or, in general, possess large curvature. In the presence of 
reentrant comers pseudo double ray type nodes are introduced which remove the concavity. 
The generation of these nodes creates the geometric semiligatures as stated in Blum & 


Adaptive finite element analysis with quadrilateral elements 


639 


pjagel (1973). In the presence of end zone arcs, pseudo double ray type or pseudo triple 
r-ay type nodes are generated depending on the angle which the arc subtends in the domain. 
'I'tius, as a deviation from the usual MAT, the end points are replaced by either normal or 
branch points. 

Thus, each branch of the SP is bounded by exactly two of these node types as mentioned 
3 _t>ove. Each node type in turn is represented by the radial lines traced from them to 
th e boundary of the object. Hence, in such a representation it is possible to maintain 
3.11 the shape properties of the parent domain. In the traditional MA methods, the width 
properties of the object are represented by the MD while the axial properties of the object are 
represented by the branches of the MA, which is an intrinsic coordinate system independent 
of the external Cartesian system used to represent the object. In the proposed approach, 
Keeping in mind the subsequent domain decomposition, the representation is done by the 
branches of the SP and the skeleton nodes which specify the radial lines - so the intrinsic 
coordinate system in this case is different from the one as given in Nagel & Blum (1976). 


3 - Id Decomposition into superelements: The SP branches of a domain, together with 
the radial lines and the boundary edge segments form the basis of the subdivision of 
the domain into superelements. In fact, superelements may be grouped into two classes, 
-viz. end-zone superelements and body superelements. The end-zone superelements are 
quadrilaterals typically bounded by two adjacent boundary segments and two radial lines, 
and with four nodes of which one node is a skeleton node while the other three are boundary 
nodes. These are generated in the regions where three boundary edges are adjacent, and the 
skeleton node in these superelements usually marks one extremity of the SP of the domain. 
The body superelements may be triangular or quadrilateral and are characterized by one 
SP branch, two radial lines, two skeletal nodes and at least one boundary node. These may 
he formed anywhere inside the domain along the SP of the body. Thus the set of these two 
types of superelements completely decompose the object into a set of non-overlapping, 
topologically simple subregions. 

As stated in the earlier section, the proposed intrinsic coordinate system of the object 
represents the object as a set of piecewise continuous SP segments bounded by the skeleton 
nodes. For representing the width, radial lines are associated with skeleton nodes. The 
superelement generation is shown as a natural extension to this object recognition strategy 
and the superelements include the boundary definition of the object together with the 
information on the interior of the object in terms of the skeleton nodes, SP branches and 
radial lines. Thus, the superelements encapsulate all the geometric features of the domain 
and may be used to represent the domain in all further computations. 

The decomposed representation of the domain as presented above is important with 
respect to the mesh generation and attribute handling point of view. The decomposition 
essentially implies that if any engineering feature is assigned to any of the superelements, 
then this feature would also be the property of any geometric entity that is extracted from 
this superelement. In a more global term, if the same feature is assigned to any part of the 
p arent domain, then all superelements which compose that specific part will also inherit that 
feature. Subsequently, any geometric entity which is extracted from these superelements 
will in turn inherit this particular feature too. As an example, if a certain boundary condition 
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is applied on a part of A, then all superelement edges which contain that part of the boundary 
inherit the same boundary condition. In turn, all element nodes which may be generated 
on all such superelement edges also inherit that boundary condition, irrespective of the 
nature of the FE mesh. Thus, a hierarchic relation is created in the proposed model, which 
is suitable for adaptive analysis applications, since at every stage of analysis the mesh is 
modified requiring redistribution of attribute data. 


3.1e Control and correction of superelements - the merging process: The quality of 
the superelements usually affect the quality of the elements generated inside them. If the 
set of boundary segments of £2 include reentrant vertices or short boundary segments, 
then distorted superelements are generated which, in turn, may be responsible for large 
element distortions within them. It has also been noticed by the authors that for convex 
vertices with large included angles, large taper distortions usually occur. The merging 
process rectifies this anomaly by moving skeleton nodes toward one another to modify such 
geometry. 

In Krishnamoorthy et al (1995), two such merging procedures viz. parallel shift and 
angular shift corrections were proposed based on the movements of skeleton nodes to 
correct the distorted superelements. A set of rules for moving the node was also laid 
out. 

3.2 Meshing by successive decomposition 

As stated earlier, in adaptive mesh generation procedures, local mesh control is of prime 
importance, hence structured mesh generators which have control only on the superelement 
edges are not very useful. 

Among the more well-known quadrilateral mesh generators, the schemes by Zhu et al 
(1991), Talbert & Parkinson (1990) and Blacker & Stevenson (1991) are notable. The 
technique presented here overcomes the shortcomings of conventional mapping techniques 
and does not involve the computational complexities of other unstructured quadrilateral 
mesh generators either. 

In the present method, the_superelements are divided recursively using discrete curve 
segments generated by transfinite interpolation. Nodes are generated recursively on these 
segments from a proposed background-grid to ensure complete internal local control of 
the mesh density. Multiple splitting methods are introduced to create transitions leading to 
mesh gradation within the subdomain. 

The general procedure for the proposed mesh generator is based on recursive splitting 
(with transitions). The procedure starts by discretization of the superelement boundaries 
into an even number of segments (Heighway 1983). Then the splitting procedure starts 
from the boundary to divide the superelement into a set of 2, 3 or 4 children superele¬ 
ments and accordingly the edges are also split. The new edges thus generated are now 
discretized by nodes and the children superelements are considered for an even number 
of segments. The procedure continues until the last child superelement edge may not be 
segmented any further. Then this superelement is recognized as a quad4 element and the 
procedure continues till all the children superelements have been thus transformed. The 
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node-spacing information is obtained from a proposed background grid which is created 
from a postprocessed FE solution, i.e. it contains the nodal spacing data corresponding to 
some error-tolerance norm. A garbage collection algorithm was developed to handle the re¬ 
cursive operations. A detailed description of this process may be found in Krishnamoorthy 
et al (1995). 


4. Refinement criteria - a new ft-refinement strategy 

In this section the authors propose a new ft-refinement strategy which deviates from the 
conventional refinement strategy and it is shown that the new strategy yields a better 
convergence rate for the problems solved. 


4.1 Conventional h-refinement criteria 


r ■ 


'4 





The conventional ft -refinement procedure was first proposed by Zienkiewicz & Zhu (1987). 
Error estimators of the elements were used to construct new element sizes at the centroid 
of the old elements on the basis of the assumption that the most optimal mesh is the one 
where the error is equally distributed among all the elements of the mesh. The general 
derivation for this is as follows. 

Let a set of elliptic PDEs defined over a typical domain Q, be 

Lu = q, (44) 

subject to the boundary conditions: 

u — u on P d . 
du/dn = qr^. 

In typical linear elasticity applications, L is a linear differential operator, u is the unknown 
function of displacements, q is the body force term, and (39) is the equilibrium equation. 
If the maximum element diameter of a FE discretization is ft e , and the degree of the 
interpolating polynomial is n, then, the error in displacement is given by: 

E u (h) = 0(h n+l ) <Ch n+1 , (45) 

where C is some constant. 

If the stresses and the strains (i.e. general derivatives) are given by the mth derivative, 
then, we get 

E a (h) = 0(h n ~ m+1 ). (46) 

Then the error bounds of the strain energy, which is a quadratic functional of the displace¬ 
ment, become 

Eeift) = 0(ft 2(n “ m)+1 ). (47) 

In another form, the norm of the error in energy becomes 

II Sell = 0(ft ( ”- m)+1 ), 


(48) 
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where the global energy norm error, || E e || is computed as 

\\Ee\\ = [^[or* - cr h f[DT l [c* - a h ]dV °" 
The local energy norm error is given by 


f I**-* 

UQi 


a h } T [Dr l [a*-cr h ]dn\ 


The local and global energy norm errors are related as: 


\E e \\ 2 = f^\\E e \\i : 


(49) 

(50) 

(51) 


where, a* — smoothed stress, Oh = FE stress, £2 ; - = element domain, M = total number 
of elements. 

The convergence order of the global energy norm error is the same as that of the errors 
iirthe stresses, hence an accurate stress projection method will automatically accelerate 
the convergence in the energy norm (or L 2 norm). Thus, 


IIEJ < CiO(ft (n ~ ra)+1 ), 


(52) 


where C\ is a constant depending on element aspect ratio, quadrature rule etc. In fact this 
constant may be shown to be dependent on some norm of the displacement function u. 
As the element size tends toward zero {h -> 0), the above equation tends to an equality 
with the bound given in an asymptotic manner. It has been shown in Babuska & Szabo 
(1982) and Zienkiewicz & Zhu (1992) that if stress extraction is based on superconvergent 
principles, then indeed (52) is true. If N is the number of global degrees of freedom in the 
system, and K is an arbitrary constant, then, we get 

N = K/h 2 . (53) 


Replacing in (44), the following condition is obtained : 
|| E e || < CiA [1+(n ~ m)]/2 


(54) 


or 

\\Ee\\ < CiN-P /2 , (55) 

where p = 1 — n + m. 

The bound given above is valid for smooth solutions in regular domains. However, in 
case of non-regular domains, the following modification is suggested, 

II|| < CiA _min ^’ p ^ 2 , (56) 

where X is a parameter whose value depends upon the singularity in the domain. Usually, 
for elastic problems, X is 0.50 for closed cracks and 0.71 for a 90° comer. 

Now, let 7? be the target global error fraction and rj be the actual relative percentage 
error, then 
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?7 = 


II E e 


(57) 


(||£J 2 +||u||2)0-5 

where, ||u|| is the strain energy of the body. Obviously, this equation is valid in the local 
as well as the global form. Hence, both local and global relative percentage error energy 
measures can be computed. 

Let a reference norm be given by the following 

r0.5 


Pll = IMI/M U 


(58) 


where ||u|| is the energy computed on the basis of an improved stress field and M is the 
number of elements of the system. Thus, the upper bound of the (elementwise) local error 
estimate is set as 


pji <mi 


(59) 


It has been shown by Zienkiewicz & Zhu (1987) that if the reference norm is based on 
the local energy level then usually overrefinement may occur. So, a global energy based 
reference norm is normally used. The above equation implies that the element level energy 
error reaches a fraction of the reference energy equally over all elements as h tends toward 
zero. Thus, in the limit, the optimality condition is reached when all the elements have 
equal amounts of error in energy. 

The size indicator is defined as the following: 


It = \\Ee\\i/m 

where, & is the size indicator used to change the size of the element as 


(60) 




: hf/tf. 


(61) 


where li" ew and hf d are the new and old element sizes respectively. 


4.2 Proposed h-refinement criteria 


In this proposed /z-refinement criteria, both the reference norm and the target global error 
fraction are modified. The reference norm in this case is the weighted average of the energy 
norm which is computed as, 


M 


l|W(«)|| = ^(ll«IMi°- 5 ). 


(62) 


where ||u||j is the local element level strain energy computed by an improved stress field 
and At is the area of element i. 

The global energy density may be defined as 

,0.5 


D 


g - IMIM U 


(63) 


where, A is the area of the domain. 

The local energy density is defined as 


A = Hull i/AV 


0.5 


(64) 
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Thus, the modification to the target global error fraction is given as 

m = vlDg/DiT, (65) 

where of is a parameter which lies between 1.00 and 1.25 for most problems. The equation 
given above decomposes to the original expressionTor 77, (57), when a is zero. In physical 
terms the above equation reduces the modified target global error fraction where the energy 
density of an element exceeds the energy density of the system. Thus, automatically, this 
modification forces the element sizes to be smaller where there are large stress excursions. 
The modification to the reference norm is given as 

Pilmod=l|W( W )||/A a5 , (66) 

where, || * \\ m0 d is the modified reference norm. It may be noticed that instead of the average 
of the energy which was computed in (58), this norm computes the weighted average of 
the energy. In the case where all the elements are of equal size, the modified reference 
norm decomposes to the original reference norm. If L2 norm is used instead of the energy 
norm and Von Mises stresses are used instead of Cauchy’s stresses, then this refinement 
criteria changes into the adaptive accuracy scheme as reported by Grosse et al (1992). 
The weighted average energy norm was first discussed in the context of structural shape 
optimization by Bugeda & Oliver (1991), 

WEehSCih^WRl (67) 

where \\E e \\i is usually the local error in energy, the constant C\ is the user defined 
target global error fraction, $ is some measure of the element size, A, is a function of 
the convergence rate and ||jR|| is the reference norm. 

In this context it may be relevant to discuss some other previously published refinement 
criteria, which are either SED-based or SED-enhanced, and their differences with the 
proposed criteria. 

Melosh & Marcal (1977) defined an SED-based refinement criteria which was used for 
mesh enrichment strategies. The differences in the SED from the centroid of the element to 
other Gauss points were computed and depending on the magnitude of these differences the 
element was divided into 4 subelements. No strict refinement strategy was thus followed 
and there is no explicit computation of the terms in (67) thus making it totally distinct from 
the proposed strategy. 

Botkin & Bennet (1986) treated the variation in strain energy as a measure of the error 
in the FE solution. Thus, this was also an SED-based estimate and the refinement equation 
was given by 

\\V e \\i < C^ k \\D k V\\, (68) 

where, ||V e ||;= elemental error in SED, Ci = proportionality factor, = element size 
measure, ||D*V|| = k-th variation in SED, jfc = 1 for linear problems. 

Equation (68) is similar in structure to (67). It may be noticed that in (67) I both the 
reference norm and the local error are in terms of SED which is quite different''from the 
proposed strategy where the SED is used to modify only the factor Q ; the reference norm 
and the local error terms are in terms of weighted energy and strain energy respectively. 
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Figure 5. Example problems, (a) Bracket and (b) cracked panel. 

The technique proposed by Cedillo & Bhatti (1988) is again a simple mesh enrichment 
procedure where a given element is broken into 4 subelements provided the following 
relationship is valid 

LSED > p GSED (69) 

where, LSED = local (elemental) strain energy density, GSED = global strain energy 
density, and fi = tolerance limit. 

In this strategy no explicit error estimation is done and thus no specific refinement 
strategy can be followed marking its difference from the proposed strategy. 

Lee & Lo (1992) use a SED enhanced scheme as proposed in the current scheme. In 
Lo’s scheme, it is argued that the LSED to GSED ratio is very high for elements near 
the singularities. To accomodate for this effect, the convergence rate X in (67) is modified 
by the inclusion of this ratio such that at the presence of singular zones, the convergence 
rate increases accordingly. Thus, this method is also different from the proposed strategy 
where the convergence rate is unchanged but the target global error fraction is modified 
such that it reduces in the presence of a singularity. Lo’s algorithm does not modify the 
reference norm either, as is done in the proposed method. 

r'- 

5 . Case studies 

To demonstrate the efficiency of the proposed refinement scheme over the conventional 
refinement scheme - two-plane elasticity examples are selected as shown in figure 5, and 
adaptive FEA is performed with a target global error of 5%. The meshes are shown in 
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(a) 



No. of DOF = 746 

Relative Percent, Error = 9.55 



No. of DOF = 3900 
Relative Percent Error = 4.77 


(b) 



No. of DOF = 572 No. of DOF = 3546 

Relative Percent Error = 10.94 Relative Percent Error = 4.71 

Figure 6. Adaptive analysis of a bracket, (a) Conventional and (b) proposed strategies 



figures 6-7, the stress distribution of the cracked panel problem is shown in figure 8 w 
the convergence plots are shown in figure 9. Of the two problems solved, the brack 
characterized by a complicated geometry and the open crack problem is characterize! 
a singularity at the crack tip. It is observed that for both the refinement strategies, 
non-singular bracket problem yields almost similar results in terms of convergence r; 
However, in case of the modified refinement strategy, the mesh shows better localizatic 
the high stress zones. Thus, in non-singular problems, the new refinement strategy br 
about a measure of directional refinement - to yield an r-h refinement process. Foi 
bracket problem, the conventional refinement strategy yields a error percentage of 4.7' 
a mesh with 3900 degrees of freedom, whereas, using the proposed refinement strai 
3546 degrees of freedom in the final mesh yield an error percentage of 4.71. Thus 
proposed strategy yields a more economic solution. 

In case of the singular problems, the new refinement strategy yields a better converg 
rate in addition to a better localization of the mesh. Thus, in such problems, the r-h me 
actually shifts nodes closer to the singular zones resulting in higher convergence rate 
the same number of degrees of freedom. Using the conventional refinement strategy, I 
degrees of freedom in the final mesh yield an error percent of 4.72 while the prop 
strategy requires only 2524 degrees of freedom to yield an error percent of 4.15.1 
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(a) 



No. of DOF a 260. 

Relative Percent. Error = 14.45 



No. of DOF = 2760- 
Relative Percent. Error = 4.72- 


(b) 



No. of DOF = 226 

Relative Percent. Error = 13.97 



No. of DOF = 2524 
Relative Percent. Error = 4.15. 


Figure 7. Adaptive analysis of a cracked panel, (a) Conventional and (b) proposed strategies. 

in the case of this singular problem also, using the proposed strategy results in a more 
economic solution. 


6. Conclusions 

Based on the work presented here, the following conclusions may be drawn. 

(1) The skeleton-based domain decomposition procedure is ideally suited for the decom¬ 
position of complex objects into simple mapable subregions. 

(2) Since the mesh generator is activated in very simple domains (i.e. either quadrilateral 
or triangular), the speed of mesh generation is fast. 

(3) As each superelement is considered in turn for mesh generation, this method is ide¬ 
ally suited for parallel/distributed computing applications. In sequential computing 
environments also, substructuring may be adopted easily to bring down computational 
costs. 

(4) The superelement generation process automatically eliminates convex comers from 
the meshable regions. 

(5) The mesh generation process is based on very general splitting procedures, hence this 
technique may be used in problems with higher dimensions too. 
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(a) 

KN/CM*CM 



eked panel. Variation of a xx (a), o yy 
(b), and a xy (c). 
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(6) The proposed refinement strategy shows asymptotic rates of convergence, thus dis¬ 
pensing of the equal error distribution paradigm. In fact, it is shown that errors are 
distributed in proportion to element sizes which automatically imposes more stringent 
local refinement criteria at high stress zones. 

(7) An r-h adaptive method is embedded in an /z-refinement framework by using the 
proposed strategy. 

(8) In both the problems solved, the modified strategy requires a lesser number of degrees 
of freedom than conventional strategy to achieve the same global accuracy levels. 
Thus, the modified refinement strategy makes the adaptive process more economical. 

(9) The proposed refinement strategy speeds up convergence for singularity dominated 
problems thus imparting greater measures of reliability to the FE solutions of such 
problems. 


7. Future direction of research 

(1) Adaptive analysis of R-M plates using field consistent 4-node elements was reported 
by the authors (Reddy & Turkiyyah 1995), where the effects of plate thickness and 
boundary conditions on adaptivity were discussed. 

(2) The skeletal decomposition of general parametric surfaces has been taken up for adap¬ 
tive analysis of shells. 

(3) The mesh generation process is being modified for generation of quadrilateral elements 
on 4-sided doubly curved superelements. 

(4) An optimization based global-local error estimate is being developed for local point- 
wise error control and asymptotic convergence rates. 

(5) A new patchwise superconvergent stress recovery procedure is under development 
incorporating residuals due to equilibrium violation, natural boundary condition vio¬ 
lation and interelement stress differences. 
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Abstract. FEPACS (Finite Element Package for linear static, dynamic and 
instability Analysis of Composite Structures under hygro-thermo-mechanical 
loads) incorporates a complete library of consistent and correct 1-, 2- and 3-. 
dimensional linear and quadratic general purpose finite elements. In this paper, 
we shall discuss the finite element technology that has gone into the package 
as well as its present modelling and solution capabilities. We shall also discuss 
briefly recent developments toward enhancing the package: Robust compos¬ 
ite elements based on a C°-continuous higher order transverse deformation 
plate/beam theory, and nonlinear element technology and solution strategies. 
Finally, we shall also briefly touch upon several satellite application modules 
that are in different stages of planning/development to aid FEPACS: damage as¬ 
sessment/prediction, expert-like advisors for solid modelling and finite element 
modelling/analysis, pre-/post-processing for FEPACS applications, structural 
optimisation and related finite element algorithms, and finally, a frontal solution 
module for FEPACS to enhance its feasibility for vectorisation/parallelisation. 

Keywords. Analysis of composite structures; hygro-thermo-mechanical 
loads; linear structural analysis, finite element package. 


1. Prelude 

Today, due to very high demands on functional excellence and technological perfection, 
availability of advanced material and design technology, the advent of advanced mathemat¬ 
ical and computational tools and computer technology, and finally due to stringent safety 
and economical constraints, precise and accurate prediction of structural behaviour and 
strength under complex and adverse environment is called for, particularly with reference 
to aerospace and automobile applications. The finite element method (FEM) appears to 
be the only computational tool that can be used successfully for high precision structural 
analysis meeting current demands as well as expectations in the near future. 

Due to its versatility and simplicity in applications, finite element analysis (FEA) has 
attracted researchers from many different engineering fields, e.g. aerospace, civil, mechan¬ 
ical, automobile, electrical and so on, particularly with reference to structural applications, 
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evolved for formulating robust finite elements that are free of all the errors under any cir¬ 
cumstances. A library of general purpose linear and quadratic finite elements are developed 
for 1-, 2- and 3-dimensional structural modelling/analysis. FEPACS uses such a scientif¬ 
ically proven element technology and incorporates a complete library of state-of-the-art 
robust elements. We review this work here. 


2. Introduction 

Understandably, displacement type finite elements have been the standard for developing 
the element library of any general purpose package since such formulations are simple 
to code and economical to use. In addition, the displacement type elements can be made 
robust (simple, but efficient) even in their distorted configurations. Also, in a general 
purpose finite element software, C°-continuous element formulations are sought since such 
formulations can satisfy the required continuity conditions exactly and allow one to use only 
the engineering degrees of freedom to model a problem. In the case of flexural elements 
(e.g. beams, plates and shells), they permit modelling of transverse . deformation (e.g. 
transverse shear) which becomes important in case of laminated and/or thick structures. 
The first C°-continuous finite element (flexural) formulation was developed in the late 
60’s - the 8-noded quadratic degenerated shell element (Ahmed et al 1970). 

However, soon after they were discovered, the C°-continuous flexural elements were 
found to be crippled with dramatic failures such as shear locking and stress oscillations 
even when the conventional continuity and completeness requirements were satisfied (Do¬ 
herty et al 1969; Pawsey & Clough 1971; Zienkiewicz etal 1971). Later, similar situations 
were recognised - membrane locking in curved elements (Stolarski & Belytchko 1981), 
parasitic shear (Cook 1975), incompressibility locking (Fried 1974), nonlinear locking 
(Naganarayana & Prathap 1996) etc. Many ad-hoc techniques are offered in the litera¬ 
ture to alleviate these different types of locking - reduced/selective integration (Pawsey & 
Clough 1971; Zienkiewicz et al 1971), assumed strain methods (MacNeal 1982; Prathap 
1993), addition of incompatible (bubble) modes (Wilson et al 1973), quasi-conforming 
techniques (Tang et al 1984), mixed/hybrid formulations (Pian 1971), to name a few 
- with varied success. Often these methods lacked a scientific explanation either for 
their success or for their failure. Though some attempts were made to explain lock¬ 
ing - singularity of shear stiffness (Zienkiewicz 1977), constraint counting and rank of 
the shear stiffness (Cook et al 1981; Hughes 1987) - they generally tried to locate the 
symptoms of the problem rather than the cause (Prathap 1986). They lacked a scien¬ 
tific basis and could not explain the locking phenomena in a unified sense. Finally, they 
never offered any methodology for eliminating the locking errors, or, in other words, 
they could not prove why certain ad-hoc techniques could eliminate locking under certain 
circumstances. 

In the early 80’s (Prathap & Bhashyam 1982), a scientifically valid paradigm was intro¬ 
duced to explain the existence of locking in a Timoshenko beam element and the success of 
the reduced integration technique. Over the next fifteen years, this paradigm matured as a 
scientific principle -field consistency - in the finite element method offering explanations 
for the existence of locking, delayed convergence and stress oscillations in the so-called 
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class of constrained media elasticity (Prathap 1986) and a library of field-consistent linear 
elements - elements with linear field interpolations - was developed (Ramesh Babu 1985; 
Ramesh Babu et al 1985). 

Later, the field-consistency paradigm was implemented in several quadratic finite ele¬ 
ments. New consistency paradigms were offered to explain the difficulties that arose here 
and to provide methods to eliminate these errors - consistent mapping (Prathap & Na¬ 
ganarayana 1992), edge consistency (Prathap & Somashekar 1988), stress/initial-strain 
field consistency (Prathap & Naganarayana 1990a), warping correction (Naganarayana & 
Prathap 1989a), and variational correctness (Prathap 1988). These principles are critically 
examined and implemented in several general purpose quadratic displacement type 1-, 2- 
and 3-dimensional elements (Naganarayana 1991). A complete library of robust, consis¬ 
tent and correct linear and quadratic linear elements is now available (Prathap et al 1994a). 
An extensive treatment of the different paradigms, apart from the classical continuity and 
completeness requirements, was given recently by Prathap (1993, 1994). 

This library of robust linear and quadratic 1-, 2- and 3-dimensional elements, was pro¬ 
posed to be implemented in a general purpose finite element platform using the solution 
capabilities and data management of SAP-IV (SAP - structural analys is program) which 
was available in source in public domain (Prathap et al 1989) for modelling and analysing 
advanced anisotropic and laminated composite structures effectively. Initially, the basic 
laminated composite beam and shell elements were implemented in the SAP-IV infrastruc¬ 
ture on a UNIVAC system The complete element library was subsequently implemented in 
the UNIX operating system on PC386/486 platforms and on workstations. The first version 
of the package - FEPACS: Finite Element Package for Analysis of Composite Structures 
- was released in 1991 for linear static and dynamic structural analysis on a PC386/486 
platform in a UNIX environment (Prathap & Naganarayana 1991). Later, the eigenvalue 
solution capability was enhanced to accommodate the consistent mass description and 
then extended for analysis of structural instability (Naganarayana et al 1993). The current 
version, FEPACS-2.0 (Prathap et al 1994b), has more than 15,000 (executable) lines of 
in-house developed code out of about 20,000 lines. A diagrammatic description of the 
package is given in figure 1. 

Currently, several satellite modules tailored around FEPACS-2.0 are initiated at the 
National Aerospace Laboratories, Bangalore - expert advised finite element modelling and 
adaptive mesh refinement, pre- and post-processors, damage/failure mechanics, nonlinear 
structural mechanics and automated post-buckling solution capability, robust higher order 
shear flexible finite elements, hygro-thermal effects on structural behaviour, structural 
optimisation, and frontal solution modules for linear structural analysis. 

In this paper, the state-of-the-art finite element technology and the finite element library 
in FEPACS-2.0, and the satellite structural modelling/analysis modules that have been 
currently developed/planned for FEPACS applications are briefly described. 


3. Finite element technology - the C-concepts 

In this section, we shall discuss the state-of-art finite element technology of FEPACS and 
its scientific foundations. 



dus-oao U. 1 s 7snoo Ij ^vianavoTiiaLi 



Figure 1 . Finite element package for analysis of composite structures - FEPACS, version-2.1. 
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In the finite element method, we discretise the continuum/structure into a number of f 
sized divisions called elements', capture the basic equations of solid/structural mecha 
The equations include strain-displacement relations such that compatibility condil 
are satisfied over the element domain, the stress-strain (constitutive) relations - am 
element loads to form total potential energy (complementary energy in case of stress 
formulations) of each element; assemble the element total potential energy such tha 
compatibility conditions are satisfied across the element boundaries and the boun 
conditions on the structural boundary are satisfied, apply minimum total potential en 
principle (minimum complementary energy principle in case of stress type formulati 
to obtain global equilibrium equations as a set of simultaneous algebraic equations 
associated with each unspecified degree of freedom in the finite element model in ti 
structural stiffness, nodal degrees of freedom and nodal forces) which can be solved i 
any established numerical method. Out of several approaches that are proposed ii 
literature, the displacement type approach is used now for most general purpose struc 
applications. 

It is usually believed that finite element analysis produces discretized displacement f 
which are approximations of the actual solution and the strains and stresses are then dei 
as the derivatives of the displacement field in each element - displacement correspond 
paradigm (e.g. Barlow 1976, MacNeal 1994). Hence, it is believed that the displacen 
are sampled to the most accurate order in a finite element and that strains and stresse 
always computed to one order less than that of displacement recovery. However, re 
studies indicate that it is the strain field that is always directly sampled to form s 
energy in energy-based methods. Hence, it is argued that, in the finite element anai 
a best-fit solution is always sought for the strain and stress fields within an element 
the displacement field is computed in an integral sense from these sampled values c 
strains and stresses such that the minimum total potential energy principle is satisfie 
the assembled finite element model - this can be called the stress correspondence para< 
(Prathap 1996). 

Once the basis of finite element method (for structural analysis, in particular) is ui 
stood in terms of the stress correspondence paradigm, we still face the questions: 
accurate can the finite element method be? What are the prerequisites for assuring 
correct convergence rate for a given finite element model? To answer these quest 
we need to examine how the element strain-displacement relations are formed sine 
rest of the method is well understood. Conventionally, it is argued that, the. mathei 
cally expected convergence rate could be sought in a finite element model by satis! 
the so-called continuity and completeness conditions, which can be broadly describ< 
follows. 

The Continuity condition requires that the interpolation functions chosen for each 
variable must be such that the corresponding field and its successive derivatives 
(n — l)th order should be continuous over the element and the same has to be em 
across the element boundary as well where nth order derivative of the field is requir 
define the structural strain energy. Accordingly, the formulation is referred to as C 
continuous. 

The Completeness condition requires that the field interpolations chosen should be 
that rigid body motion should not produce strains and constant strain states of the ele 
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Table 1. Types of errors, their sources in finite element analysis and associated paradigms. 


Source 

Symptoms 

Paradigm/concepts 

Errors of first kind 

Finite element 
discretisation 

Discretisation errors 

Continuity and 
completeness 

Errors of second kind 
Constrained 
media elasticity 

Locking, 

delayed convergence, 
stress oscillations 

Constrained-field 

consistency 

Element distortion 
(nonuniform mapping) 

Locking, 

delayed convergence, 
stress oscillations 

Edge consistency and 
consistent mapping 

Varying material/ 
sectional modulii 

Stress 

oscillations 

Unconstrained-field 

consistency 

Initial-strain/ 

initial-stress 

representation 

Stress 

oscillations 

Unconstrained-field 

consistency 

Reconstitution of 

strain/stress 

fields 

Poorer convergence, 
spurious load mech¬ 
anisms and stress 
oscillations 

Variational 

Correctness 

Modelling warped 
surface with plane 
elements 

Erroneous stresses 
and displacements 

Warping correction 
(minimum virtual 
work principle) 


should be represented and that no intermediate polynomial terms should be dropped while 
interpolating the displacement fields. 

It was conventional wisdom that the finite element formulations that satisfy the above 
continuity and completeness conditions shall converge in a variationally correct rate. How¬ 
ever, subsequently it was found that the finite elements suffer from many problems such as 
locking, delayed convergence, and stress oscillations when applied a large class of prob¬ 
lems - constrained media elasticity where at least one or more components of the strain 
tensor are constrained at a physical limit and many other classes of problems (table 1). 
These problems make element formulations virtually unacceptable for general structural 
applications. Several ad hoc techniques such as reduced integration, mixed/hybrid meth¬ 
ods, addition of incompatibility modes etc. were normally associated with stress sampling 
at the so-called optimal/Barlow points. All these techniques offered only selective/partial 
success from general purpose point of view and lacked a scientifically sound explanation 
either for their success or for their failure (if any). Basically, the solutions offered to the 
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basic questions -How confident are we with finite element analysis? and How reliable are 
the finite element solutions? - were not completely satisfying. 

In the early 80’s, this class of problems was approached with a new scientific under¬ 
standing (Prathap & Bhashyam 1982) and over the past one and half decades a complete 
consistent and correct scientific basis is established for formulating any general purpose 
robust finite element for stmctural applications (e.g. Ramesh Babu (1985), Prathap (1986, 
1993,1994), Naganarayana (1991), Prathap et al (1994a)). In this process, several classes 
of structural problems which suffer from errors like locking, delayed convergence and 
stress oscillations in the general purpose finite element formulations were identified; ap¬ 
propriate consistency principles were evolved to explain the existence of such errors, their 
root causes and possible method of eliminating the same in a variationally correct manner 
(table 1). Several techniques such as reduced integration, least square field-reconstitution, 
Legendre polynomial expansion can now be reinterpreted with the new scientific rigour for 
applications in finite element formulations. Here, we shall briefly recapitulate the different 
aspects of the new understanding - consistency and correctness paradigms. 


Constrained-field consistency: The displacement fields involved in computing the con¬ 
strained strain energy components should be interpolated such that all the physical con¬ 
straints on the corresponding strain energy component are fully represented without 
leading to any spurious constraints in the penalty limits. In other words, the terms in 
a constrained strain field that have partial contribution from the constituent displacement 
fields, leading to the spurious constraints in the penalty limits for the corresponding strain 
energy components, should be eliminated from the formulation for assuring the expected 
rate of convergence from a finite element model of the structural problems belonging to 
the class of constrained media elasticity. 


Consistent Mapping: In any parametric formulation , mapping of the strain and stress 
fields from the natural system (the system in which the element configuration is always 
undistorted and the displacement fields are interpolated) to the working system (the system 
in which the problem is defined and the solution is sought for) should retain all the true 
discretised constraints without introducing any additional spurious ones. 


Edge-consistency: The tangential strain components which are continuous across the el¬ 
ement boundary in the undistorted natural coordinate system should remain continuous 
even after the necessary transformations; and the tangential strain components should be 
built from their corresponding tangential displacement components only. 


Unconstrained-field consistency: The terms in the strain and/or stress fields that do not 
participate in the strain energy computations (and hence in the displacement recovery) 
should not be retained while recovering the corresponding strains and/or stresses in a 
displacement type formulation. 

All the above consistency paradigms suggest some form of field-reconstitution (ei¬ 
ther strain or stress fields) to eliminate the associated errors such as locking and stress 


FEPACS: A computational tool for linear structural analysis 


661 


oscillations. There are many ad hoc schemes (e.g. reduced integration) to perform the 
same. The field-reconstitution, however, should be performed in accordance with the fol¬ 
lowing variational basis. 


Variational correctness: The reconstituted field should be orthogonal to the error field 
being introduced in the reconstituted field with reference to the original field. 

The above paradigms - Correspondence, Continuity, Completeness, Consistency and 
Correctness, collectively called the C-concepts - form a complete scientific basis for robust 
finite element formulations (figure 2). 

4. Finite element library 

The Structural Analysis Program (SAP) is one of the earliest general purpose finite element 
programs used for structural analysis (Wilson 1970). This was later improved and released 
as SAP-IV for linear static and dynamic analysis of 3-dimensional structures (Bathe et al 
1974). As it has been released with its source code in the public domain, it.has served 
as the spring board for many other finite element packages with various improvements. 
It is well known that SAP-IV has reasonably efficient solution capabilities and data han¬ 
dling procedures. However, its main weakness is its obsolete finite element technology 
and the limited element library (only linear elements that can model only isotropic struc¬ 
tures). The idea of developing a general purpose finite element package with the strength 
of the state-of-the-art finite element technology available in the National Aerospace Lab¬ 
oratories with an emphasis on application to the laminated composite structures under 
hygro-thermo-mechanical loads was conceived in the late 80’s. FEPACS: Finite Element 
Package for Analysis of laminated Composite Structures under hygro-thermo-mechanical 
loads (Prathap et al 1989) was initiated with this in mind. Initially, the package was built 
around the data handling and program organisation of SAP-IV. Over the years, the ele¬ 
ment library was replaced with a complete library of the consistent and correct 1-, 2- and 
3-dimensional, linear and quadratic displacement type finite elements. In this section, we 
shall briefly describe the salient features of each element in the FEPACS library (figure 3) 
for linear analysis of composite structures. 

4.1 The boundary element 

SPRING - A 2-noded spring/boundary element: This is a simple spring element de¬ 
fined by two nodes (one being a specified structural node and the other to fix the spring 
direction) in space. The spring stiffness can be prescribed along any six engineering de¬ 
grees of freedom so that any general translational and/or torsional springs can be mod¬ 
elled with this element. Apart from the spring stiffness, it takes specified displacement 
along the element axis and specified rotation about the element axis. It can also be ef¬ 
fectively used as a general boundary element allowing initial displacements (boundary 
conditions) on the structural node to be imposed. By using a proper combination of dis¬ 
placement and stiffness conditions, it can be used to emulate the multi-point constraints 
in a structural model, to model elastic foundations, etc. Finally, if the plate/shell element 
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Figure 2. The C-concepts in the finite element method. 
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Figure 3. FEPACS-2. 1 : The element library. 
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formulation cannot support the rotation about the local normal, the boundary element 
can be used to suppress this degree of freedom locally at the specified node in a general 
structure. 

4.2 The truss/bar element 

TRUSS - A 2-noded truss/bar element: A spatial (3-dimensional) truss element is incor¬ 
porated in FEPACS. This element is based on 1-dimensional elasticity assumptions - that 
the structure can take loads and deform only in its axial direction - in the element coor¬ 
dinate system. This element does not involve any inconsistency problems in its original 
form and hence the conventional formulation suffices the general purpose requirement. 
This element can support both mechanical and thermal loads. 

4.3 The beam elements 

The 3-dimensional elasticity relations are reduced to one-dimensional relations for a beam 
by using the one-dimensional nature of its geometry - 2-dimensions axe very small when 
compared to the third - using different levels of approximation (theories) such as the 
elementary theory (Euler-Bemoulli); first-order shear deformable theory (Timoshenko); 
and several higher order transverse deformable theories (e.g. Lo-Christensen-Wu). In this 
section we shall briefly describe a family of beam elements based on the elementary and 
first order shear deformable theories. 

All the beam elements can be used to model any general laminated composite and 
anisotropic beam of any geometry in 3-dimensional space. The 3-dimensional orthotropic 
and anisotropic constitutive relations are statically condensed to its equivalent one¬ 
dimensional form. The elements can model any 3-dimensional straight/curved beam/frame 
structures. The elements can be loaded along all the six engineering degrees of freedom. 
They can also be effectively used as the stiffener elements with the plate, shell, and solid 
elements. The eccentricity of the stiffener can either be modelled by defining the beam 
nodes as slave nodes or by processing the stiffener offset explicitly. Finally, the elements 
can also model uniform, tapered, and stepped beams of any cross-section and curvature 
and are always free of all errors of locking and stress oscillations. 

EBEAM2 - A 2-noded Euler-Bemoulli beam element: This element is based on the 
classical theory of beam flexure (Euler-Bemoulli): the plane normal to the neutral axis 
remains plane and normal to the neutral axis even after deformation. Thus, the element 
involves only four independent fields - axial extension, axial twist, transverse deflections 
along two orthogonal directions in the cross-sectional plane. The transverse deflections 
require C 1 -continuity to be satisfied while the other degrees of freedom should satisfy 
C°-continuity. The one-dimensional cubic Hermitian polynomials are used for interpo¬ 
lating the transverse deflection fields and the linear Lagrangian polynomials are used for 
interpolating the other degrees of freedom and the element geometry. This element is for¬ 
mulated in the Cartesian coordinate system to be used as a straight thin beam element. A 
variation has been provided in the curvilinear systems as well so that it can also be used 
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as a curved thin beam element. This element, in its original form, suffers from membrane 
locking when used as a curved beam. However, the constrained membrane strain field is 
made field-consistent using the method of Legendre polynomial expansion making the 
element free of locking and stress oscillations. 


TBEAM2 - A 2-noded Timoshenko beam element: A 2-noded beam element based on the 
first-order transverse shear flexible theory of beam flexure that is compatible with the shell 
elements of FEPACS (Timoshenko): the plane normal to the neutral axis remains plane 
but need not be normal to the neutral axis after deformation. Thus, the element involves all 
the six engineering degrees of freedom - axial extension, axial twist, transverse deflections 
along two orthogonal directions in the cross-sectional plane and sectional rotations about 
the two transverse directions. All the degrees of freedom need to satisfy only C°-continuity. 
The linear Lagrangian polynomials are used for interpolating all the degrees of freedom 
and the element geometry. This element is formulated in the Cartesian coordinate system 
to be used as a straight thick/thin beam element. This element, in its original form, suffers 
from shear locking as the beam becomes thin. However, the constrained transverse shear 
strain fields are made field-consistent using the method of Legendre polynomial expansion 
making the element free of locking and stress oscillations. 


TBEAM3 - A 3-noded curved Timoshenko beam element: A 3-noded Cartesian curved 
beam element based on the Timoshenko theory of beam flexure that is compatible with 
the quadratic shell elements of FEPACS is included in the FEPACS library. The element 
involves all the six engineering degrees of freedom and all the degrees of freedom need 
to satisfy only C°-continuity. The quadratic Lagrangian polynomials are used for interpo¬ 
lating all the degrees of freedom and the element geometry. This element is formulated 
in the Cartesian coordinate system to be used as a straight/curved thick/thin beam ele¬ 
ment. This element, in its original form, suffers from shear locking, membrane locking, 
delayed convergence and complex stress oscillations when the beam becomes thin and/or 
used to model curved beams under the conditions of inextensible flexure (Naganarayana & 
Prathap 1990; Prathap & Naganarayana 1990). To capture the constrained-field inconsis¬ 
tencies correctly, the element is modelled using four coordinate systems: Global Cartesian 
to model the structure and for displacement recovery; Element plane Cartesian for interpo¬ 
lating the degrees of freedom in terms of their respective nodal values; running Cartesian 
for capturing the strain and stress variation; and the natural curvilinear system for defining 
the field interpolations. The constrained strain fields (membrane and transverse shear) are 
made field-consistent using the method of Legendre polynomial expansion in the element 
plane Cartesian system making the element free of locking and stress oscillations in its 
general form. The strain fields are mapped from one system to another in a consistent 
manner so that the element behaves accurately even when the mid-node does not lie at the 
mid-point. The stress resultant fields are reconstituted such that they do not have any term 
that does not participate in the displacement recovery so that the element is free of stress 
oscillations even when its sectional modulii vary, e.g. tapered beams. 
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4.4 Elements for 2-dimensional elasticity 

A class of linear triangular and quadrilateral elements based on the 2-dimensional elas¬ 
ticity equations is included in FEPACS. All the elements are formulated in an isopara¬ 
metric sense. The conventional isoparametric formulations suffer from locking (parasitic 
shear) and stress oscillations (Prathap 1985). Though a straightforward application of the 
consistency principles can alleviate these errors from the elements, a new error, Poisson’s 
ratio stiffening, is introduced due to lack of terms that can represent flexural deformations 
of the elements. The linear interpolation functions have been augmented by the quadratic 
incompatible (the so-called bubble ) functions (Wilson et al 1973) and then the consis¬ 
tency paradigm is applied to alleviate the problems of locking (parasitic shear), Poisson’s 
stiffening effect, and the associated stress oscillations (Prathap 1993) from the elements. 
All the elements can take both mechanical and hygro-thermal loads. They can be used to 
model any temperature-dependent orthotropic medium. 


TRIPS3, QUAPS4 - Plane-stress/plane-strain elements: These elements are based on the 
constitutive relationships that are derived from the theories of plane stress and plane strain 
respectively. They are always defined in the yz-plane, the x-axis representing the thickness 
or axial direction. Accordingly, they can be effectively used to model any orthotropic plane- 
stress or plane-strain problems under thermo-mechanical loads. 


TRIAX3, QUAX4 - Axisymmetric solid elements: These elements are formulated based 
on the axisymmetric solid mechanics. The elements are defined in the rd- or yz-plane, 
the x-axis representing the axis of revolution. These elements can be used to model any 
orthotropic axisymmetric structures under axisymmetric thermo-mechanical loads. 


TRIM3, QUAM4 - Membrane elements: The plane-stress elements mentioned above are 
now defined in 3-dimensional Cartesian space so that we can model the 3-dimensional 
orthotropic membrane problems under thermo-mechanical loads effectively. The element 
is first designed in a local Cartesian plane capturing its plane-stress behaviour and the 
element matrices are then mapped onto the 3-dimensional Cartesian space using the regular 
transformations. 

4.5 Plate/shell elements 

FEPACS element library contains robust linear and quadratic plate/shell elements based on 
the first order shear deformable theory (Reissner-Mindlin) to enable one to model any kind 
of laminated composite plate or shell structures subjected to thermo-mechanical loads. 


SHEL4 -A 4-nodedplane shell element: SHEL4 is a 4-noded quadrilateral plane-shell 
element that can be used to model thick and thin plates/shells with equal accuracy. The 
element is formulated in a local mean plane defined by the mid-points of the four edges 
and transformed into the global Cartesian system. The original elements suffer from shear 
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locking and the associated stress oscillations. To alleviate these problems the transverse 
shear strain fields are made consistent. The transverse shear component in the plane tan¬ 
gential to the element mid-surface is also explicitly matched from element to element - 
edge-consistent formulation - such that the element performs well even when collapsed 
to a triangle. The effects of element warping (i.e. when the element nodes do not lie in a 
single plane) is captured by applying a waiping correction on the element matrices based 
on the virtual work principle (Naganarayana & Prathap 1989a). 


SSHL8/DSHL8 - 8-noded degenerated shell elements: SSHL8 is a 8-noded element us¬ 
ing the quadratic serendipity functions for interpolating the field-variables as well as its 
geometry in element plane Cartesian system. The element deformation is captured in a 
running local Cartesian system. The 3-dimensional elasticity equations are degenerated 
to 2-dimensional equations using the assumptions of the Reissner-Mindlin theory. The 
element is formulated in two different versions: the single-surface and the double-surface 
formulations. The thickness dimension of the 3-dimensional elasticity is modelled us¬ 
ing two sets of geometric nodes on the bottom and top surfaces of the element in the 
double-surface formulation. This element is more efficient for modelling problems in¬ 
volving adjacent elements that do not share common normals at the joining nodes. On the 
other hand, in the single-surface formulation, the thickness dimension in the 3-dimensional 
elasticity equations is modelled using the normal to the element midsurface at the element 
nodes. Such formulations need only one set of mid-surface nodes and are convenient to use 
in general. The single-surface formulation is particularly efficient in modelling problems 
involving adjacent elements that share a common normal to their mid-surfaces. 

These elements, in their conventional form, suffer from transverse shear locking, mem¬ 
brane locking, parasitic shear and associated stress oscillations. A special technique called 
line consistency (Naganarayana & Prathap 1989b) is used to make the transverse shear 
strains completely consistent without introducing any spurious zero energy modes. Selec¬ 
tively reduced integration strategy and the assumed strain method are used to eliminate 
membrane locking (by making the membrane strain energy consistent) from the elements 
DSHL8 and SSHL8 respectively. Therefore, SSHL8 can be used to model the laminated 
composite structures more efficiently when compared to DSHL8. The parasitic shear is 
also eliminated from the formulations by making the in-plane shear strain component 
field-consistent. These elements are very accurate even when they are distorted to some 
extent. 


SSHL9/DSHL9-9-noded degenerated shell elements: Single-surface and double-surface 
elements are also formulated using the biquadratic Lagrangian shape functions resulting in 
the corresponding 9-noded formulations - SSHL9 and DSHL9. Again the 3-dimensional 
elasticity is similarly degenerated to a 2-dimensional formulation. The geometric descrip¬ 
tions of the two versions are also similar to that of their serendipity counterparts. 

These elements, in their conventional form, suffer from transverse shear locking (de¬ 
layed convergence), membrane locking, parasitic shear and the associated stress oscilla¬ 
tions. The transverse shear strain components are mad e field-consistent using the method 
of Legendre polynomial expansion without introducing any spurious zero energy modes. 
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Selectively reduced integration strategy and the assumed strain method are used to elimi¬ 
nate the membrane locking (by making the membrane strain energy consistent) from the 
elements DSHL9 and SSHL9 respectively (not published). Again, SSHL9 is more efficient 
in modelling the laminated composite structures when compared to DSHL9. The parasitic 
shear is also eliminated from the formulations by making the in-plane shear strain com¬ 
ponent field-consistent. These elements are very accurate even when they are distorted to 
some extent. 

4.6 Elements for 3-dimensional elasticity 

A family of linear and quadratic hexahedral elements are developed based on an isopara¬ 
metric formulation. The elements are based on 3-dimensional elasticity. They can take 
orthotropic and anisotropic material properties. These elements can be used to model any 
structure subjected to hygro-thermo-mechanical loads. 


HEXA8 - A 8-noded hexahedral element: HEXA8 is an 8-noded brick element using 
the trilinear Lagrangian interpolation functions for element geometry as well as field- 
description. In its original form, it suffers from shear locking when used to model thin 
structures and near-incompressibility locking when used to model nearly incompressible 
structures and the associated disturbances in the stress recovery. It is made free of these 
errors by augmenting the displacement fields with the incompatible (bubble) functions and 
making the resultant constrained transverse shear and volumetric strains field-consistent 
(Chandra & Prathap 1989). 


HEXA20: A 20-noded hexahedral element: HEXA20 uses the 3-dimensiottal quadratic 
serendipity shape functions for interpolating the element geometry as well as the field vari¬ 
ables. In its conventional form it suffers from both shear locking and near-incompressibility 
locking. These errors are alleviated using the consistency concepts in line with those used 
for the 8-noded plate elements in Prathap et al (1988). The resulting formulation is free of 
locking and stress oscillations, but suffer from spurious zero energy modes when used to 
model loosely constrained problems. 


HEXA27: A 27-noded hexahedral element: HEXA27 is also an isoparametric formula¬ 
tion based on the triquadratic Lagrangian shape functions. In its original form, it suffers 
from locking and/or delayed convergence and stress oscillations. The line-consistency con¬ 
cepts are extended to 3-dimensional plane-consistency and the method of Legendre poly¬ 
nomial expansion is used to eliminate locking problems from this element (Naganarayana 
& Prathap 1991) without introducing any spurious zero energy mec hanisms 


5. The higher order transverse deformable elements 

The demand for accuracy in the transverse stress predictions - in thick and laminated 
structures - is increasing owing to the advanced structural applications, partic ular ly in 
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the aerospace and automobile industry. Such demands are being made frequently with 
the advent of high-precision design strategies, high computational capabilities and safety 
awareness. A full-fledged 3-dimensional solution for obtaining the transverse stress distri¬ 
bution is very expensive from a computational point of view. Recently, several higher-order 
transverse deformable plate/shell theories and the corresponding finite element formula¬ 
tions have been used for predicting more accurate transverse stress distributions over the 
plate thickness when compared to their elementary and first-order counterparts. 

Higher order flexural deformation in plates has been modelled in many different fashions 
leading to existence of several higher order shear deformable theories (Lo et al 1977; 
Reddy 1984; Liao et al 1992). Out of the several theories, the Lo-Christensen-Wu theory 
is found to be the best candidate from the general finite element package point of view 
since it requires purely C°-continuous field description. 

5.1 Theoretical basis 

The 3-dimensional displacement field is expanded in terms of the thickness coordinate 
explicitly such that the in-plane displacements are interpolated to a cubic level while the 
transverse displacement is interpolated to a quadratic level as: 

U(x, y, z) = u(x, y) +zO(x, y) + z 2 u*(x, y) + z 3 &*(x, y), 

V(x, y, z) = v(x, y) + z<p(x, y) + z 2 v*(x, y) + z 3 <t>*(x, y), 

W(x, y, z) = w(x, y) + zf(x, y) + z 2 w*(x, y), (1) 

thus reducing the 3-dimensional elasticity to a 2-dimensional problem. By substituting the 
above displacement field description into the regular 3-dimensional strain-displacement 
relations, we get a cubic description for the in-plane strain components, a parabolic descrip¬ 
tion for the transverse shear strain components, and a linear description for the transverse 
normal strain component across the plate thickness. 

5.2 Finite element technology 

Again, from a general purpose applications point of view, finite element formulation should 
be free of all errors irrespective of their shape and thickness. In the limits of thin plates, 
the transverse shear and normal strain energy components are constrained to vanish in 
the higher order shear deformable element formulations. The specific consistency require¬ 
ments for such elements and the variationally correct method of achieving the same have 
been recently evolved (Mohan et al 1994) so that the higher order shear deformable ele¬ 
ments can model the above mentioned physical constraints correctly. It should be noted 
here that the transverse shear strains are made consistent in the thickness coordinate by 
suitably choosing the field description. The consistency requirements in the in-plane co¬ 
ordinates are explicitly achieved using the method of Legendre polynomial expansion. 
The elements are then made edge-consistent by matching the tangential transverse shear 
strain/stress components across the element boundaries in any arbitrary patch of elements 
and by mapping the constrained strain fields from the natural covariant space to the global 
Cartesian space in a consistent fashion (Naganarayana et al 1995). The elements based on 
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such formulations perform very accurately, both in terms of displacements and stresses, 
even in their most distorted forms. A library of error-free beam (stiffener) (2-noded element 
BEAM2 and 3-noded element BEAM3) and plate elements (4-noded element QUAD4 and 
9-noded element QUAD9) are developed based on consistency and correctness principles. 
Uniform full integration schemes are used in the element formulations. All the elements 
behave accurately in linear static, dynamic and instability applications. All elements can 
take laminated materials constitution with each lamina taking 3-dimensional orthotropic 
properties. 

The complete library of laminated beam/plate elements that are currently available are 
being extended to tackle curvilinear laminated curved-beam/shell problems. This element 
library comprises the special module that is being planned to enhance FEPACS with 
extended interlaminar stress analysis capability as well as enhancing its current capability 
to tackle structures that are very thick and/or highly flexible in transverse deformation 
more accurately. With this complete module of higher order transverse deformable curved- 
beam/shell elements, one can model any general stiffened laminated composite structures. 

6. Linear analysis capabilities 

The power of a general purpose finite element package is reflected by its capability to tackle 
a wide range of problems in its field of application. The solution capabilities, apart from 
the element technology that is used, mainly represent its versatility. Today, for structural 
applications, it is essential to have complete linear (static, dynamic and stability) solution 
capabilities in a general purpose finite element package. 

Currently, FEPACS has the core solution capability of the original SAP software (linear 
static and dynamic (lumped mass) solution) which was recently enhanced to tackle dis¬ 
tributed mass and linear stability as well. The solution can be performed in a single block 
0 in-core solution) as well as in multiple blocks ( out-of-core solution). Thus the solution 
capability can be utilised very efficiently for small as well as large problems as the require¬ 
ment arises. Thus, the package can tackle any number of degrees of freedom provided the 
computer platform used can support the scratch file and dynamic memory requirements. 
The element matrices are assembled into their global counterparts in a banded form to 
optimise the memory requirement. 

In this section, w'e shall briefly touch upon the salient features of the solution capabilities 
of FEPACS and the recent developments that are underway to improve/replace the present 
module. 

6.1 Linear static analysis 

A typical static structural analysis involves solving a set of simultaneous equations repre¬ 
senting the structural equilibrium: 

[K]u = /, (2) 

where [/£] is the structural stiffness, / is the vector of applied nodal forces, and u is 
the vector of the unknown nodal degrees of freedom. A Gauss elimination algori thm is 
used to decompose the positive definite symmetrical system of equations. The algori thm 



FEPACS: A computational tool for linear structural analysis 


671 


is optimised to a certain extent: no operations on the zero elements, load vectors are reduced 
at the same time as the stiffness matrix is decomposed. 

Once the equilibrium equations are solved for the nodal displacements, the element 
stresses are computed from the element nodal displacements extracted from the displace¬ 
ment solution using the same strain-displacement and constitutive relations that were used 
in computing the element stiffness. Stresses are recovered accurately all over the element 
domain since the element technology used is based on the consistency and correctness 
principles. Stresses can, normally, be recovered at either element nodes or at the popular 
Barlow points for each element. The stress/strain fields are reconstituted in accordance 
with the unconstrained-field consistency paradigm such that the results are variationally 
correct even when the sectional properties of the structure varies over an element. 

6.2 Linear natural frequency analysis 

Free vibration analysis is one of the basic requirements in structural design in many fields 
of applications - aerospace, offshore, seismic, automobile etc. Typically, the problem is 
to solve a set of homogeneous equations, 

[K]<P = a?{M]f, (3) 

where [M] is the structural mass matrix; and co and <p are the natural frequency and mode 
shape respectively. The above eigenvalue problem is solved for the lowest natural fre¬ 
quencies using two distinct solution procedures: determinant search method and subspace 
iteration technique (Bathe & Wilson 1973). The determinant search solution is carried out 
when the element matrices can be contained in the high-speed or active memory in one 
block. On the other hand, for systems of large order and bandwidth the subspace iteration 
method is used such that the equilibrium equations are tackled in multiple blocks. Both 
solution techniques solve the generalised eigenvalue problem directly without a transfor¬ 
mation to the standard form (Bathe & Wilson 1973). Originally, only lumped mass was 
considered. Recently, the option of distributed mass was also included (Naganarayana et al 
1993) into FEPACS so that one can obtain a better model for the free structural vibrations, 
since the element library is basically based on a shear deformable theory and hence can 
depict secondary frequency spectra accurately (Bhashyam & Prathap 1981). 

6.3 Linear dynamic analysis 

For the response history analysis of structures subjected to dynamic loads we seek solution 
to the dynamic equilibrium equations, 

[M]d 2 u/dr 2 + [C]d«/dt + [K]u = f, (4) 

where M, C, K are the mass, damping and stiffness matrices of the structure, and / 
and u are the transient nodal force and displacement vectors. The damping matrix is 
computed as a linear combination of the mass and stiffness matrices such that the modal 
orthogonality is satisfied. As a variation, the dynamic equilibrium equations can also handle 
structures subjected to uniform ground acceleration by replacing the right hand side by 
— [M]d 2 u g /dt 2 where u g is the ground displacement and u is the structural displacement 
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with reference to the ground. The dynamic structural response analysis problem, can be 
carried out in two ways using FEPACS: natural frequency analysis followed by response 
history analysis using the mode superposition method, and response history analysis by 
direct integration (Clough 1962). 

In another variation of dynamic analysis, with a particular reference to seismic problems, 
the Cartesian components of the ground acceleration can be used to carry out the response 
spectrum analysis using the concept of spectral displacement (Clough 1962). FEPACS 
computes the maximum responses for each mode where the spectra (displacements or 
accelerations) in the three directions are assumed to be proportional to each other. 

6.4 Linear stability analysis 

The linear buckling analysis of a structural system is carried out as an eigenvalue solution, 

m = mg]*, (5) 

where G is the structural geometric stiffness; X represents the critical load factor; and 4> is 
the corresponding buckling mode. It is assumed that the pre-buckling structural behaviour 
is linear in nature and that the internal force distribution is linearly proportional to the 
applied reference force. Then, the applied reference load multiplied by the critical load 
factor X cr represents the buckling load configuration. 

Sometimes, the internal force distribution is known a priori for a given load config¬ 
uration. In such cases, the elastic and geometric stiffness matrices are computed for the 
system and the eigenvalue solution is carried out directly ( single phase strategy ) to find 
the buckling loads and modes. However, very often, the internal force distribution has to 
be computed explicitly for a problem with certain specified topology, boundary conditions 
and material constitution. In such cases a multi-phase strategy is adapted wherein a pre¬ 
liminary static analysis is carried out to find the internal force distribution induced in the 
structure by the reference loads which is then used to find the structural geometric stiffness. 
The elastic stiffness computed in the first phase and the geometric stiffness computed in 
the second phase are then fed pr the final phase of eigenvalue solution to obtain the criti¬ 
cal buckling loads and the associated modeshapes. Thus, FEPACS can be used to analyse 
simple well-defined as well as general structural instability problems very efficiently and 
accurately (Naganarayana et al 1993). 

6.5 Frontal solution module 

The method of solving the equilibrium equations is a major factor influencing the com¬ 
putational time and memory requirement in general finite element structural analysis. 
This becomes very important, particularly in case of nonlinear and/or dynamic finite el¬ 
ement analyses, where several equilibrium iterations have to be performed for several 
load/time/displacement increments. From the point of computational efficiency, time and 
memory, frontal solution techniques are becoming more and more popular recently since 
these techniques can possibly give an edge over the conventional active-column solution 
strategy as the one existing in the current version of FEPACS. 

In frontal solution technique (Irons 1970), the complete assembly of the global matrix 
is never carried out explicitly. The process of assembly of elements and elimination of 
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variables are carried out simultaneously. Typically, the degrees of freedom associated 
with a structural node are statically condensed out of the solution system as soon their 
interaction with the nodes on the boundary surrounding the node is well established in 
the simultaneous process of continuing assembly of element matrices and the equilibrium 
solution. Thus the distinctive feature of frontal method is storing only the active equations 
in the core memory. This feature leads to low core memory requirement. Apart from this, 
the frontal method has several other advantages as well. Back-substitution for different 
load vectors requires very little core memory. This feature is particularly attractive in the 
context of nonlinear analysis or eigen solution where many resolution with different right- 
hand-side vectors are required. It is much easier to add static condensation and, therefore, 
the substructuring capabilities into the solution capabilities. Since the assembly and the 
solution processes are reasonably parallel in nature, frontal solution is more convenient 
for parallelising and/or vectorisation. 

A complete solution module for linear static, instability and dynamic structural analysis 
is being planned in a stand-alone mode. This module could be tailored to be absorbed in 
FEPACS subsequently. It is also planned to use the frontal solution capabilities for the 
forthcoming nonlinear finite element modules, that are being developed, independently. It 
is currently aimed at including some special modelling facilities to handle multiple load 
cases, fixed or rigid boundary conditions, prescribed displacement boundary conditions, 
multipoint constraints, skew boundary conditions, lumped and/or consistent mass matrix, 
concentrated mass at arbitrary nodes, static condensation and substructuring, and various 
types of elements that are in the current FEPACS version. 

In the first phase of the work, the basic frontal algorithm has been developed for static 
solution and all the modelling features mentioned above have been incorporated. In the 
next phase, it is planned to include a general eigenvalue solution capability such that linear 
buckling and free-vibration analysis can be performed. In the third phase, forced response 
computation capabilities would be included to complete the solution capabilities in-par 
with the current regular solution features of FEPACS. 


7. Composites applications 

Laminated composite structures are gaining importance in many applications, particularly 
in the fields of aerospace, automobile and naval engineering. Their growing popularity 
can be traced to many desirable properties they possess, such as, very high weight-to- 
strength and weight-to-toughness ratios, excellent fatigue resistance, tailorable directional 
thermo-mechanical properties, reduced part count over their metallic equivalents etc. They 
have been routinely used to produce many structural components as well. Today, there¬ 
fore, the thrust of general purpose computational structural mechanics is to enable one 
to model general laminated composite structures subjected to hygro-thermo-mechanical 
loads. 

In FEPACS, bending elements are allowed to have a general laminated construction such 
that each lamina can take isotropic, orthotropic or anisotropic material properties with dis¬ 
tinctthicknesses. The formulation for modelling laminated structures with 3-dimensionally 
orthotropic laminae is presented below. The model can take any number of layers with 
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any arbitrary fibre orientation and thickness. The mechanical strains experienced by the 
structure are given by, 

6m = ( (£ + 2K) - (£ ° + »»M, (6) 

where e, k and 7 are the total membrane, bending and transverse shear strains, and eo 
and fto are the initial (thermal) membrane and bending strains proportional to midsurface 
temperature T and temperature gradient AT respectively. The constitutive relation for the 
anisotropic medium is given by, 

o- = Qe m , (7) 


where Q is the constitutive matrix containing 21 independent constants. Normally, lami¬ 
nated structures are composed of laminae with 3-D orthotropic properties. Depending on 
the type of lamina materials, Q can be appropriately computed. If P is the potential of the 
applied loads, the total potential energy for the system is given by 

P MTPE = \j ^nQ e mdv - P. (8) 

Substituting ( 6 ) and (7) in ( 8 ), and applying the minimum total potential energy principle, 
we get the structural equilibrium equations as, 

f (Se r (Ae + Bk — FT — GAT) 
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where, if Q and Q represent the constitutive matrices correlating the inplane and transverse 
deformation respectively, 
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and a is the vector of thermal expansion coefficients. 

In case of anisotropic construction, the structural constitutive relations, matrices A, B, 
D, E, F, G and H, have to be explicitly supplied to FEPACS. This 2-dimensional model 
is used in formulating the plate/shell elements - SHEL4, SSHL 8 , DSHL 8 , SSHL9 and 
DSHL9. For the one-dimensional elements - EBEAM2, TBEAM2 and TBEAM3 - these 
2 -dimensional relations are statically condensed to get the equivalent one-dimensional 
constitutive relation. 

The higher order shear deformable laminated composite beam/plate elements (§ 5) are 
also formulated in a similar way. These elements can be effectively used for refined linear 
interlaminar stress analysis and for linear dynamic and instability analysis of laminated 
plates and beams under hygro-thermal and mechanical loads (Mohan et al 1994). 
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8. Hygro-thermal loads 

Recently, it was observed that displacement type elements suffer from extraneous stress 
oscillations when used to model hygro-thermal loads (Prathap & Naganarayana 1994). It 
was shown that it was due to violation of the unconstrained-field consistency paradigm. 
Solutions are offered to eliminate these errors in first order shear deformable and 1-, 2- 
and 3-D elasticity formulations by reconstituting the initial strain fields to achieve consis¬ 
tency with reference to the corresponding total strain fields. The orthogonality conditions 
required are derived by taking an equivalence of the single-field minimum total energy 
principle with the general three-field Hu-Washizu principle. 

The consistent, correct and complete analysis capability is implemented in FEPACS for 
hygro-thermal stress analysis of general laminated composite structures. FEPACS consid¬ 
ers a temperature description that is compatible with the displacement field description. 
For example, both mid-surface temperature and temperature gradient across the element 
thickness are considered in case of bending elements and the coupled membrane-flexural 
structural behaviour is captured. This becomes particularly important in applications to 
laminated composite structure where the in-plane and flexural deformations can be im¬ 
plicitly coupled. 

The consistency paradigms - related to the initial strain problems - are recently ex¬ 
tended to the higher order shear deformable element formulations. It is noticed that the 
inconsistent hygro-thermal strain field description can also affect the constrained-field 
consistency requirements apart from the above mentioned unconstrained-field consistency 
requirements of an element. The consistency paradigms are judiciously applied leading 
to optimal element formulations. All the elements in the library of higher order shear 
deformable elements are now reconstituted to tackle the hygro-thermal problems in lam¬ 
inated composite structures in a consistent and correct manner. Apart from consistent 
hygro-thermo-mechanical stress analysis, it is now possible to compute the combined in¬ 
fluence of the moisture content, the temperature distribution and the mechanical loads on 
structural vibration and buckling as well. This capability is being consolidated with the 
module of the higher order shear deformable element library. 

9. Nonlinear finite element analysis 

Today, nonlinear structural analysis is becoming more and more important, acknowledging 
the increasing demand for high precision applications, use of advanced materials, and for 
safe design procedures, particularly in aerospace and automobile applications, apart from 
conventional fields such as metal forming and many other manufacturing techniques. A 
structure can experience mainly two types of nonlinearities - geometric (where kinematic 
relations are nonlinear) and material (where constitutive relations are nonlinear) - apart 
from other phenomena like contact, friction etc. Very often nonlinear structural behaviour 
is coupled with (macro-) structural instabilities (e.g. buckling) and/or material or micro- 
structural instabilities (e.g. necking, shear band formation etc.). As a first step toward 
having a complete nonlinear capability in FEPACS, work has been in progress in developing 
fully automated geometrically nonlinear analysis capability for structures with possible 
structural instabilities. 



676 


B P Naganarayana et al 


Element technology: Efforts have been initiated toward developing an error-free finite 
element technology for geometrically nonlinear applications (Naganarayana & Prathap 
1996). The causes for the nonlinear locking and the associated disturbances in the stresses 
have been located and eliminated in a variationally correct manner in formulating robust 
curved beam elements. We plan to extend the technology to the 2- and 3-dimensional 
elements as well to complete the element library from the geometric nonlinearity point of 
view. 


Automated solution strategies: Recently, the different steps involved in a typical auto¬ 
mated nonlinear and/or post-buckling solution strategy are identified; different strategies 
that are evolved for each step are compiled and examined critically for their comparative 
performance in optimal combinations of strategies for best performance in geometrically 
nonlinear applications with different kinds of possible instabilities such as limit points and 
bifurcation points (Naganarayana 1995). Such strategies are currently being implemented 
in a modular fashion for general finite element applications. 


10. Pre- and post-processing 

Keeping in mind the vast usage of the finite element method for real-state problems, it has 
become nearly impossible to depend on manual data preparation and result interpretation. 
A pre-/post-processor has, therefore, become an integral part of finite element analysis. Ef¬ 
forts are being made to enhance FEPACS utility from the real general purpose applications 
point of view with a Graphical User Interface (GUI). 

Recently, FEPACS was interfaced with a commercially available pre-/post-processing 
software released by MSC - MSC-XL - which intrinsically supports the MSC/NASTRAN 
data structure (Geetha & Naganarayana 1995). The sort and search algorithms of NAS- 
TRAN are emulated in the interface software, FEPNAST, to interpret NASTRAN data 
in FEPACS format. The complete element library of FEPACS is not directly supported 
by MSC-XL. Data related to the unsupported elements are indirectly emulated, e.g. the 
quadratic elements of FEPACS are generated using an appropriate MSC-XL model gen¬ 
erated using the linear elements of NASTRAN. 

Currently, another commercial pre- and post-processing software released by EMRC — 
DISPLAY-ID - which supports the finite element package, NISA, is being interfaced with 
FEPACS. The data handling structure of NISA is reproduced to interpret the NISA data 
deck generated by DISPLAY-IH and the data is written to a file conforming to the FEPACS 
format. The interface module, FEPNIS, is currently capable of interfacing the basic linear 
elements of the NISA element library for static analysis with FEPACS. 

Finally, efforts are being made to develop an indigenous GUI that can be used for pre- 
and post-processing for general purpose finite element modelling and analysis to support 
FEPACS. It is planned to develop the software in C-language under the UNIX operating 
system. The X-developmental tools and Motif are being used for the menu/window op¬ 
erations and the graphics. The skeletal infrastructure is developed for incorporating the 
slots for desirable capabilities of a pre- and post-processor to be filled in due course of 
time - NALGRAF. As an initial step, a post-processing module is implemented in the 
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infrastructure for presentation graphics. Currently, a general purpose geometric modelling 
module with manual support as well as with an expert advisor support is being developed 
for incorporation in the infrastructure. 


',4 


■ 




11. Structural damage assessment/prediction 

Damage is one of the most important criteria for safe design, particularly in aerospace 
and automobile applications. And, composites are becoming the preferred candidates for 
many structural applications due to their obvious superiority over their metal counterparts, 
particularly from the specific strength point of view. Their applications are still limited 
since their behaviour is not understood properly from a damage mechanics point of view. 
Accordingly, failure mechanics in composite structures has been one of the thrust areas of 
research in the literature for the past 2 to 3 decades. 

Recently, work has been initiated to develop a damage module for FEPACS. As a first 
step, failure mechanisms in delaminated structures have been critically examined taking the 
geometric nonlinearity of the structure and delamination propagation into consideration. 
A simple method is evolved to model the delaminated stiffened composite structures, to 
predict delamination growth in terms of pointwise energy release rate distribution along the 
delamination edge, to trace the multiple post-buckling deformation modes, and to predict 
the residual strength of the damaged structure (Naganarayana & Atluri 1996). 

12. Expert advisor for finite element analysis 

It is very desirable to carry out tasks that involve high levels of expertise, experience 
and judgement even in the absence of the required experts. Today, this desire has come 
close to fulfilment to some extent due to the advances in software technology. Expert 
system activities are initiated (Naganarayana & Prathap 1992) keeping enhancement of 
FEPACS capabilities in mind. An expert advisor was first incorporated into a 3-dimensional 
finite element package for problem modelling and adaptive mesh control (Prathap & Na¬ 
ganarayana 1992). Later, an expert advisor was devised for the general purpose finite 
element modelling of the 2-dimensional structures using the C-language in the DOS en¬ 
vironment (Naganarayana et al 1994). It is now planned to develop a full-fledged expert 
system for general purpose finite element modelling as a module in NALGRAF - the 
pre-/post-processor that is being developed at NAL (see § 10). Currently, as a first step 
toward this goal, the solid modelling and graphics capabilities are being enhanced with 
reference to the experience and expertise acquired over the past few years. The software 
is developed in C. The menu operations and graphics are supported by the X-development 
tools and Motif in the UNIX operating system environment. 

13. Finite element structural optimisation 

A structural optimisation module is being developed for integration with FEPACS. The 
module is being planned to incorporate several state-of-the-art Mathematical Programming 
Techniques available in the literature for structural optimisation (Haftka & Kamat 1985). 
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In the first phase of development, constraints based on static consideration only will 1 
considered with the objective being weight minimisation; the basic structural elements 
TRUSS, BEAM2 and SHEL4 - will be supported; and a design sensitivity analysis basi 
on both displacements and stresses will be incorporated. 

Using FEA alone for repeated analysis during the process of optimisation is quite e 
pensive. Hence, approximate methods (Schmit & Miura 1976) are devised (e.g. Tayloi 
series expansion to define the approximate analysis problem) such that the constraints ai 
the objective function values can be quickly evaluated with reasonable accuracy and mu 
less expense. Currently, each FEA cycle is followed by several approximate analysis cycl 
and the sequence is repeated until the results converge. 

14. Conclusions 

In this paper, we intended to outline the research and development activities involved 
developing the general purpose Finite Element Package for Analysis of Composite Stn 
tures (FEPACS). The element technology, modelling and solution capabilities of FEPA< 
are briefly described. The current version of FEPACS has the state-of-the-art element tec 
nology based on what are called the C-concepts. It incorporates a complete library of fin 
elements (1-, 2- and 3-dimensional linear and quadratic). Each element is free of all' 
rors of locking, delayed convergence and stress oscillation even in its most general for 
The package, currently, has the linear static, instability and dynamic (free/forced vib: 
tions) analysis capabilities. It can model any general isotropic, orthotropic, anisotrop 
and laminated composite structure subjected to hygro-thermo-mechanical loads. 

Currently, several research and development activities are in progress in an effort 
enhance the FEPACS capabilities both in terms of utility and applicability to a wider ran 
of problems. The current linear solution capability of FEPACS is being augmented by 
independent module based on frontal strategies. Work is in progress to evolve automai 
general nonlinear solution capability for finite element applications. From pre- and po 
processing point of view, FEPACS is now being supported by two commercially availal 
GUI’s - MSC-XL on UNIX platform and DISPLAY-IH on DOS platform. An indigene 
GUI is being developed in the C-language using X-developmental tools and Motif kit 
graphical support on UNIX platform. An expert advisor for structural modelling, fin 
element modelling and structural analysis is being developed in association with sv 
a GUI. A module for structural optimisation is also under progress to be incorpora 
with FEPACS. Finally a module for damage assessment and/or prediction is also initia 
particularly for composite applications. In this paper, all these activities were also brie 
touched upon. 
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Abstract. The increasing complexity of man-made systems calls for new 
tools and techniques to model them efficiently and at the desired level of ab¬ 
straction. Well-established modelling paradigms, such as finite state machines, 
petri nets, communicating sequential processes etc., which are borrowed from 
the fields of computer science and operations research, often lack certain es¬ 
sential features for capturing discrete event dynamics. New tools such as state 
charts, timed transition models, finitely recursive processes etc., are evolving to 
take into account some of these requirements. In this paper we first character¬ 
ize such systems as well as typical problems related to them. We then discuss 
and critically evaluate several modelling frameworks through examples. At the 
end we provide a comparison among the frameworks and directions for future 
research. 

Keywords. Logical models; discrete event systems; continuous variable sys¬ 
tems. 


1. Introduction and motivation 

Discrete event systems (DES) and their modelling, analysis, observation and control are 
receiving increasing attention in recent years. Such systems are nearly always man-made. 
Therefore, quite naturally, as man-made systems grow in size and complexity, need arises 
to evolve a systematic theory to design, operate and evaluate them. Examples of such sys¬ 
tems include manufacturing systems, chemical processes, traffic systems, communication 
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networks, robotic systems etc. Such systems are characteristically viewed to possess a d] 
Crete statespace derived from associated physical variables and are asynchronous, eve 
driven, often nondeterministic or admit choice of events by some unmodelled mechanisn 
They consist of subsystems which evolve concurrently and with interactions in the foi 
of interlocks, communications via channels or shared physical variables. The physic 
variables include numeric continuous variables such as a liquid level, numeric discrc 
variables such as machine parts in a buffer, or non-numeric variables such as traffic ligh 
relay flags or valve positions. Primarily the logical behaviour of the systems, namely t 
sequence of events that occurs, is of interest. Structural properties such as liveness, rear 
ability etc., need to be assessed or ensured by proper design. Furthermore, often the 
systems are required to meet hard real-time‘deadlines. Such systems also arise due to st< 
quantization in continuous variable systems (CVS). 

A DES, fundamentally, is a system that has a discrete set of states in one of which 
exists, and generates (or responds to) sequences of events in the course of its evoluti 
(or operation). This easily contrasts with the CVS, that have continuous state spac< 
execute continuous (at least piecewise) trajectories, and are described by ordinary or part 
differential equation models. While most physical systems, either man-made or natur 
behave as CVS at a sufficiently high level of resolution, it may be unnecessary to 
concerned with such a description for a given purpose. For example, while designing t 
basic configuration of a digital circuit, one is never concerned with the analog “innarc 
of the integrated circuits. In fact, man-made systems are often so complex that, in order 
study many aspects of their design and operation, it becomes necessary to adopt a high 
level of abstraction, whereby the underlying continuous dynamics has to be complete 
obliterated. This is often achieved by quantizing the continuous statespace into a set 
zones or discrete states, while events label the (discrete) state transitions that occur unc 
the continuous dynamics. The need arises, therefore, for a new modelling paradigm tl 
can represent such systems. Our efforts, in this paper, would be to introduce some of I 
most common modelling frameworks that have been used for this purpose. 

Historically, this area has been a very active field of research during the last decade, a 
this trend is likely to continue in the years to come. The following arguments and facts < 
indicative of its inherent potential. 

First, note that with increase in technological advance, the complexity of all things m* 
made is increasing. This emphasizes the need for theories, methods and tools to mod 
analyse, specify, design and verify such systems at different levels of abstraction. As me 
and more complex systems are built to meet new performance standards, there will be 
dearth of challenging problems in this field. 

Second, it is interesting to observe that, while much of control theory today is concern 
with the problems of CVS, design and implementation of industrial control systems i 
often more concerned with problems of sequencing (e.g., programmable logic controller 
real-time distributed control systems, resource scheduling (e.g., level-3 controllers in 1 
automation pyramid for batch processing plants, job shop type of flexible manufacturi 
systems) etc. Research and development in the area of DES are therefore likely to brid 
this gap and should find application in a broad class of industrial control systems. 

Third, the discipline of DES is assuming significantly interdisciplinary character, 
present it comprises a blend of techniques mainly from the areas of computer scien 
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systems and control, and operations research. Simultaneously, its application domains 
range from the design of micro-chips to that of computer communication systems spanning 
continents. Interaction among the workers from such diverse fields would definitely bring 
about fruitful new marriages between application and theory. 

The paper is organised as follows: § 2 describes the various features of DES dynamics. 
The important logical models, including finite state machines, Petrinets, statecharts, timed 
transition models and process algebras are introduced in § 3. In § 4, with the help of an 
example drawn from the area of flexible manufacturing systems, the modelling features 
of the different frameworks have been illustrated. Finally § 5 concludes the work with a 
brief comparison of the models and some directions for future work. 

We refrain from calling this a survey of the field because of two reasons. First, although 
we have included references about all major aspects, we have not attempted an exhaustive 
literature review. Second, the work leaves out the large class of timed and stochastic models 
of DES as well as many aspects of control, observation, specification, verification etc. and 
concentrates just on modelling the logical behaviour of the systems. 


2. Features of the domain 

In this section we discuss the nature of the dynamics to be modelled and the nature of 
problems to be solved in typical contexts involving DES. Both these aspects are important 
for determining the appropriateness of a modelling framework for a given application as 
well as for understanding the motivation for different features of the framework. 

2.1 Nature of DES dynamics 

Various authors have mentioned the typical characteristics of DES dynamics (Ho 1989). 
We classify the features of a DES under the following heads: 

(i) Asynchronism and instantaneity of events: The system trajectory is viewed as a se¬ 
quence of events or state transitions. Asynchrony arises because of the fact that the 
components of the system that interact, including the environment, often do not pos¬ 
sess synchronized or global clocks. There are also possibilities of random events, 
such as a machine breakdown, which are fundamentally asynchronous. The events 
are assumed to be instantaneous mainly for convenience, when one does not want 
to model the duration of occurrence of the event and when there is no ambiguity 
about the sequence of events. If necessary, one can model the duration of an event by 
two subevents, namely, start-event and finish-event, each of which can occur asyn¬ 
chronously, only maintaining the correct sequence. These features also arise naturally, 
when an underlying CVS exists, and a DES state is associated with, an (in)equality 
involving a point function of CVS signals, while DES events label transitions from 
one DES state to another. 

(ii) Reactivity: A reactive system is one that has to respond frequently to stimuli from 
the environment. As a result, such systems typically possess features of real-time 
constraints, task preemption, prioritized interrupts etc. 



686 Amit Patra et al 

(iii) Selectivity: This implies that it has to often choose among alternatives in a deter¬ 
ministic or non-deterministic manner. A deterministic choice arises, when out of a 
set of distinct events, any one can take place, but the actual choice is made by the 
environment in which it is placed. On the other hand, an event taking place in a non- 
deterministic choice situation, is actually prompted by an internal mechanism which 
is not visible to the modeller. 

(iv) Concurrency with communication, synchronization and interlocks: Quite often a DES 
contains many subsystems that evolve concurrently. However, the dynamics of these 
subsystems are invariably interrelated, since all are required to work for the same 
overall objective. The concurrent operation then appears as an interleaving of the in¬ 
dividual event sequences. The interrelations restrict the possible interleavings among 
the individual sequences. Thus a particular task may require joint participation of two 
or more components, in which case, the corresponding events in the respective pro¬ 
cesses get synchronized. Similarly, an interlock may prevent or block the occurrence 
of an event in a subprocess until another event occurs in another given subprocess. 
Interlocks are often needed when a single resource is to be shared among a number of 
processes. Communication among the subprocesses is necessary to ensure synchro¬ 
nization or blocking. This can take place either as direct broadcast or point-to-point 
communication or indirectly via shared variables. 

(v) Modularity and hierarchical features: These are typical means of handling large and 
complex systems, and man-made systems almost always exhibit these features. Mod¬ 
ularity results when a system is decomposed into units that are close to autonomous, 
have well-defined input-output interfaces and perform a complete well-defined sub¬ 
task. Hierarchy is an attempt to aggregate these modules, retaining sufficient details 
for a given problem, and only allowing the flow of the minimum necessary information 
from a lower to a higher level. 

2.2 Typical problems 

Here we discuss some of the typical problems that are sought to be addressed in the context 
of specification, design and operation of DES arising in various application areas. While 
there may be several other domain-specific questions, in the following we restrict ourselves 
only to problems of generic significance. A good modelling framework should facilitate 
efficient treatments of these problems: 

(i) Analysing logical behaviour: Some important problems under this category are the 
following. 

(a) Boundedness: This property implies that the state space of the model is finite. 
A finite state-space ensures that problems on the model are solvable, at least in 
principle. However, it is possible to describe the behaviour of the same system 
using both bounded and unbounded models (Cieslak & Varaiya 1990). In such 
a case, the use of an unbounded model deprives one of the assurance of the 
existence of solutions. Therefore, before attempting to construct algorithms for 
solution of any problem on a class of system models, one should at least know 
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whether the problem is decidable for that class. Detection of boundedness is a 
sufficient condition for decidability of these problems. 

(b) Reachability: One is often interested in determining whether a particular state 
is reachable from another or any arbitrary initial state. The state information 
may be coded in terms of numerical parameters. A major problem of reachability 
analysis is that of state explosion which leads to high computational complexities. 
To improve the situation, techniques have been suggested to search most probable 
paths (Maxemchuck & Sabnani 1987), to exploit structure and symmetry in the 
model (Brave & Heymann 1993) etc. 

(c) Liveness/deadlock: A system is said to be live, if under any circumstances, there 
is at least one event that can take place. Deadlock is the reverse situation, where 
each system has halted and cannot proceed further. Deadlock arises when all com¬ 
ponent subprocesses of the system have enabled events, but they are all blocked by 
at least one of the other subprocesses. Such situations are obviously undesirable. 
Deadlock problem can be considered as a special case of reachability problem. 
The problem of designing computationally efficient algorithms to identify and 
prevent deadlocked situations has received wide attention in the process algebra 
and parallel programming language research community. The majority of early 
work on deadlock and liveness was cast in terms of resource allocation problems, 
as in operating systems (Holt 1972). At present, most of the work on deadlock 
is formulated using Hoare’s CSP formalism (Chandy & Mishra 1979; Apt et al 
1980; Hoare 1985; Brookes & Roscoe 1991) etc. A comprehensive treatment on 
deadlock can be found in Dathi (1989). 

(ii) Program verification and real-time properties: Process algebra-based DES models 
(CCS, CSP, DFRP etc.) as well as the TTM model contain many general purpose pro¬ 
gramming language features and can be used as rudimentary specification languages 
in order to make unambiguous specifications. Elaborate specification languages like 
Ada, OCCAM, LUSTRE, LOTOS, ESTEREL, SIGNAL, CSML etc., have been de¬ 
veloped, using process algebra models as the basis, and additionally providing syntax 
and semantics. Different program verification techniques (Floyd 1967; Hoare 1969; 
Manna & Pnueli 1982) can be used to verify properties of the process algebra models. 
In these methods, one makes assertions about program execution, before and after 
every instruction, which are also known as pre-conditions and post-conditions of the 
instruction respectively. Inference rules are used to reason about the whole program 
by combining assertions about the instructions. Since, in many cases, real-time and 
safety-critical reactive systems are modelled using the process algebra and TTM mod¬ 
els, real-time properties like safety, realtime constraints etc., are also verified using 
the program verification methods (Manna & Pnueli 1982; Ostroff 1990). 

(iii) Supervision/control: The kind of control one can exercise in a DES is mainly of 
a restrictive nature, since there is no mechanism to force an event to take place in 
general. This has been adopted to recognise the existence of events which are uncon¬ 
trollable (such as faults) and therefore cannot be prevented. In absence of such events 
however, one can force an event to occur by preventing all other enabled events from 
occurring. One, therefore, is interested in knowing whether prevention of inadmissi- 
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ble or non-optimal behaviour of the system is possible by systematically disabling a 
set of controllable events (Ramadge & Wonham 1987a; Wonham & Ramadge 1987, 
1988; Passino 1989; Holloway & Krogh 1990; Chung etal 1992; Kumar et al 1993b). 
The DES, whose task is to keep track of the dynamic evolution of the controlled DES, 
and decide on the control action to be exercised, is known as a supervisor. The design 
of a supervisor is difficult because of many reasons like nondeterminism in the plant, 
existence of uncontrollable events such as machine breakdown, hard time-constraints, 
insufficient information feedback from the plant etc. The desired behavioural pattern 
can be stated in several ways, for example in the form of a set of reference trajectories 
or by specifying a set of states to be avoided/visited infinitely often. The last require¬ 
ment is sometimes related to a notion of stability of the system, and the system is 
said to be stabilizable if this can be achieved under control (Brave & Heymann 1990; 
Ozveren etal 1991; Ozveren & Willsky 1991; Kumar et al 1993). 

(iv) Observation: The events taking place in a DES may not always be observed by other 
agents in its environment, for example, when the system is physically distributed over 
a vast geographical area such as communication networks. It may then be necessary 
to design a supervisor under partial observation. One is therefore interested in de¬ 
termining whether the DES is observable , i.e., whether one can determine its state 
from the observed sequence of events alone. If it is not possible to ensure this at every 
instant, it may sometimes suffice to know the state intermittently with only a finite 
number of events in between (Cieslak et al 1988; Lin & Wonham 1988a; Ozveren & 
Willsky 1990; Bose et al 1994). 

(v) Performance analysis: Apart from the problems discussed above, which are predom¬ 
inantly of a qualitative nature, one often needs to answer quantitative questions too. 
For example, in a flexible manufacturing system (FMS), one may like to know the 
production rate or the capacity utilization. Such quantitative measures are necessary 
for solving high-level problems such as production planning and resource scheduling 
in FMS, message routing in communication networks, traffic management in trans¬ 
portation systems, etc. Most often these questions are answered through extensive 
simulation. However, since our objective in this paper is to focus on only the logical 
aspects of a DES, we shall not delve into this area. 


3. Logical models of DES 

The logical models of DES can be classified broadly as state-based models, where the 
dynamics is determined by process states, and trace-based models, where dynamics is 
described in terms of the event sequences generated in the system. The state-based models 
considered in the following are the Finite State Machine, the Statechart, the Petrinet and the 

Timed Transition Model. The trace-based models are represented by the Finitely Recursive 
Processes. 
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3.1 Finite state machine (FSM) 

The finite state machine is perhaps the oldest and the simplest model of DES and originates 
from the theory of automata and languages (Hopcroft & Ullman 1979). An FSM is formally 
defined to be a 5-tuple 

g = (Q,'£,8,q 0 , Q m ), 

where Q is the set of states, £ is the alphabet of events, 8 : Q -» Q is the deterministic 
transition function, qo is the initial state and Q m c Q is a subset of states, to be called 
marked states. These states usually signify the completion of a task, which is not necessarily 
termination in the sense of ‘final’ states. Q and £ are always assumed to be finite. In general 
8 is only a partial function and Q is equivalent to a directed graph with node set Q and 
edges q —r q’ (q, q’ € Q ) labelled o e £ for each triple (a, q, q') such that q' = S(q, o). 
The transition function 8 can be naturally extended to be defined over strings of events 

8:Z*xQ-+ Q, 

where £* denotes the set of all finite length strings s of elements of £, including the empty 
string (). A language over £ is a subset of £*. The language generated by Q is given by 

L(Q) := {iu : w e £* A 3 q' — 8(w, <?o)}, 

and is necessarily prefix closed. That is, if 5 € L(Q), then all prefixes of s also belong to 
L(Q). We denote the prefix closure of a language L as L. Figure 1 presents an example 
FSM and the language it generates is given by + y )<5*, where 8* denotes zero or more 
occurrences of 8. Among the elements of £ there exists a subset which can be prevented 
from occurring (disabled) by external control. By selectively disabling such events by 
means of a control mechanism one obtains controlled behaviour. Formally let £ c c £ 
be the set of controllable events and F = { 0 , l} Sc be the set of all binary assignments 
to the elements of £ c . Then, corresponding to each such assignment y e P, one obtains 
a ‘control function’ y : £ c ->• {0,1}. In particular an event a is said to be enabled if 
y (or ) = 1. For an uncontrolled event cr e £ — £ c one can define y(o) = 1 and extend the 
map y accordingly. Then for the controlled system one can define an augmented transition 
function 

8 C : r x £ x Q —> Q, 

such that 8 c (y,cr,q) = (<5(cr, q) if 8{a,q) is defined and y(o) = 1) and undefined 
otherwise. Thus in a controlled DES the possible transitions from a given state are the 



Figure 1. Example of a finite state machine. 
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controllable ones which are not disabled and the uncontrollable ones, which cannot be 
disabled. Such a controlled DES is given by the FSM 


Qc — (2> r X E, S c , (Jo, Qm )• 

The mechanism which switches the control functions of a controlled DES to satisfy 
certain constraints, namely the supervisor, consists of a pair of entities given by 

<S = (S, <D), 

where 

S = (X,E,§,x 0 ,X m ) 

is a deterministic automaton, just as Q is, with the state set X, input alphabet E, transition 
function f : E x X -» X, initial state xo and marked subset X m c X and 

4>: X-> T 

is a total function that maps each supervisor state x into a ‘control pattern’ y e T. A control 
pattern is a tuple of 0 and 1 , representing the control actions chosen for each controllable 
event. Formally for each x € X, y := d>(x) e {0, l} Sc . <E> is known as the state feedback 
map. 

The uncontrolled plant and the supervisor are connected in a feedback fashion (see 
figure 2 ), and each event taking place in the plant is also thought to be taking place in 
the supervisor synchronously. In response to each event both plant and supervisor states 
change, a new control pattern is chosen by the supervisor state and is applied to the plant. 
Often a proper supervisor is of interest where transitions are defined for every possible 
event in the controlled plant and every state trajectory of the closed loop process can be 
extended to reach the set of marked states of the plant as well as that of the supervisor. 

The closed loop system involving the plant Q and the supervisor S is usually denoted 
as S/G. The language generated by S/Q, L(S/Q) should be such that it satisfies certain 
objectives. However, it is not possible to achieve any arbitrarily specified language by 
control. Given E and E c one can generate strings from certain controllable sublanguages 
of L(g) only. AJanguage X c E* is said to be controllable (with respect to Q c ) if 
XE U n L(Q) c K (Ramadge & Wonham 1987a; Wonham & Ramadge 1987), where K 
denotes the prefix closure of K. This implies that only those sublanguages of L(Q) can be 
generated, where strings of the form fo are in K with t € X, <x e E„ = E — E c . That 
is, a controllable language is one, for which any prefix of any string from that language, 
followed by an uncontrollable event, still remains in the prefix closure of the specified 
target (i.e., admissible) language and hence can be extended to complete strings in that 


Control 

pattern 



Plant 

_ 

cr 







Supervisor 

^ .. 

1r 

^ . 


Event 


Figure 2. Closed loop configuration 
of a DES plant and supervisor. 
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language. The supremal controllable sublanguage of L is defined to be the largest such 
language (Wonham & Ramadge 1987) 

supC(L) := (^J{K : K c L and K is controllable}. 

Thus all target specification languages to be achieved by control must be subsets of 
sup C(L). Although supervisors designed on the basis of language controllability are 
nonblocking, sometimes the resultant language may be too conservative and impractical, 
especially when the number of uncontrollable events is large. To handle such situations, 
the concept of an infimal controllable superlanguage has been defined and applied to the 
problem of supervisory control with blocking (Lafortune & Chen 1990). Another char¬ 
acterisation of the language has been in terms of an infimal prefix closed and observable 
superlanguage (Rudie & Wonham 1990). Further studies on such language aspects have 
been reported (Brandt et al 1990; Chen & Lafortune 1991; Kumar et al 1991; Yang et al 
1995). 

A major disadvantage of the FSM model is that it is structurally ‘flat’ and the size of the 
state space increases exponentially with the complexity of the problem. For example when 
two processes operate concurrently, with only a few synchronized events, the combined 
system is given by a shuffle operator on the component automata and the size of the state 
space is of the order of the product of the sizes of that of the components. To tackle this 
problem a modular approach to the task of supervisor construction has been proposed 
(Ramadge & Wonham 1987b; Wonham & Ramadge 1988). This is achieved by splitting 
the control task into several subtasks which can be solved with the existing theory, and 
combining the resultant subcontrollers to obtain the solution of the original problem. 

Modular supervisors can be implemented in a decentralized or hierarchical fashion. 
However each subsystem may require information about events in other subsystems, which 
may not always be accessible. This brings the question of supervision under partial obser¬ 
vation, or that of observability. To model this situation, the observation alphabet So and 
the projection (mask) function P : S -*• So U {()} are brought in. A special case is the 
one of natural projection, where, the unobservable events from (E — So) project to the 
null event and therefore are simply erased from the generated string to form the observed 
string. Loosely speaking, a language K c L(Q) is defined to be observable, if the projec¬ 
tion P retains sufficient information to decide whether or not, after the occurrence of some 
observable event, the resulting string is in K (Lin & Wonham 1988a; Cieslak et al 1988). 

Another characterization of observability has been given in terms of the state. A plant 
is said to be (state) observable if by observing a sufficiently long string in P{L(Q)), after 
a finite number of events, the state of the system can be determined exactly (Ozveren & 
Willsky 1990; Bose et al 1994). Concepts of stability and stabilization have also been 
introduced in terms of state space (Brave & Heymann 1990; Ozveren et al 1991; Ozveren 
& Willsky 1991). Although there exist minor differences in definitions of stability, most 
authors define a system to be stable if its trajectories go through a specified subset of the 
statespace infinitely often or, after entering it, remain there. More recently, however, a 
notion of stability has been defined in terms of the language theory (Kumar et al 1993). 
According to this, a language K is said to be L — stable, if there exists a finite integer 
N* such that Vs e K there exists N < N* such that s = sfs 2 , Li | = N and 52 e L. 
This implies that after a finite number of events (from the initial state), system behaviour 
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Figure 3. Finite state machine model 
of a robot. 

conforms to the desired behaviour. This concept is similar to that of asymptotic stability 
of CVS. 

Due to the finite state nature, all problems in this framework are decidable. Considerable 
work has been done to extend concepts of conventional control to this domain in the syn¬ 
thesis of supervisors with full state feedback (Ramadge & Wonham 1987a, 1989; Wonham 
& Ramadge 1987; Ushio 1990; Willner & Heymann 1991) and also under partial obser¬ 
vation (Heymann & Lin 1993; Kozak & Wonham 1995; Takai et al 1995). Also work 
has been reported dealing with decentralized supervision (Lin & Wonham 1988b, 1990, 
1991; Zhong & Wonham 1990; Ozveren & Willsky 1992c; Rudie & Wonham 1992), in- 
vertibility of DES (Ozveren & Willsky 1992a), tracking and restrictability (Ozveren & 
Willsky 1992b), diagnosability (Sampath et al 1995) etc. Work on the realisation of an 
FSM from a given input-output description has been reported by Kumar et al (1995). 
However computational complexity of algorithms is an important issue and is receiving 
increasing attention in recent years (Tsitsiklis 1989; Lin 1991; Rudie & Willems 1995). 

3.2 Statecharts 

This is one of the recent approaches to specification or high level modelling of large 
reactive systems. It was initially developed to facilitate exact and unambiguous description 
of system behaviour by a large group of engineers of varying backgrounds working in 
a project (Harel et al 1987). Naturally the aspects that have been emphasized in this 
framework are ease of modelling and a co-existing hierarchy of abstraction so that models 
can be easily built, possibly by drawing pictures, and details of structure and functions of 
the various components need be considered only to the extent necessary. Quite naturally, 
the formalism is graphical with inherent support to multilevel hierarchical modelling. With 
this background, we proceed to describe its basic features. 




Figure 4. Statechart model of a robot. 
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fn-process 



Figure 5. Statechart model of a robot-machine system. 

At the lowest level of abstraction, statecharts are nothing but finite state machines. 
Therefore consider the simple statechart model of a machine shown in figure 3. The ex¬ 
ternal arrow into the idle state indicates that it is the initial state. The most important 
feature of the statechart is its support to hierarchy. To see how this is captured, consider 
a robot whose model is shown in figure 4. Now let us consider the machine-robot sys¬ 
tem. This is captured by a superstate , of which the machine and the assembly-robot are 
orthogonal components. This means that these two components execute their dynam¬ 
ics concurrently. Such orthogonal subsystems are called ‘AND’ states. This is shown in 
figure 5. Note the synchronization in the event b that has been introduced to impose 
a certain restriction on independent concurrent behaviours of the FSMs ‘machine’ and 
‘robot’. Thus while the machine can start working anytime, the robot can do so only 
after the machine has finished its job and is prepared to go back to its idle state. Also, 
due to the synchronous nature of the event b , both things happen simultaneously. The 
overall superstate consisting of the machine and the robot may be seen at a different 
level of hierarchy, as one of states in which a manufacturing cell can exist. This is 
shown in figure 6 where the above superstate is named ‘In-the arrows ‘Start-process’ 
and ‘End-process’ that terminate on and originate from the state ‘In-process’ respectively. 
These may be assumed to be commands to the robot and the machine. Thus the ‘Start- 
process’ command ‘resets’ the whole status of the state ‘In-process’ such that both its 
components begin from their initial default states. On the other hand, the ‘End-process’ 



Figure 6. Statechart model of a manu¬ 
facturing cell. 
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Figure 7. Statechart example of ‘en¬ 
ter by history’. 


command causes exit from the whole superstate irrespective of which individual states the 
machine and the robots were in. Thus the ‘End-process’ command stands for both nor¬ 
mal and abnormal but forced termination. There are various types of entrance to a given 
state other than by default. One of them is by history , the other by condition. Entry by 
history is the one when the destination configuration of a transition into a superstate 
is determined by the component state of the superstate the system was in, when it 
last exited from that superstate. On the other hand, entry by condition implies the 
evaluation of a condition to determine the starting state. Entry by history is shown in 
figure 7. Further, state transitions may involve ‘guards’ or boolean conditions usually 
involving states of other concurrent components, data variables as well as logical connec- 
tivese to construct compound conditions for that enabling a transition. This is illustrated 
later in § 4. 

One of the important features of statecharts is the absence of scope restrictions on 
variables and events. Hence any state or data variable can be used anywhere and at any level 
in this formalism. Similarly, any event occurring in any component is instantly available 
or broadcast to all other components. While these features certainly ease modelling, the 
realisation of such features may pose considerable difficulties. 

Statecharts offer limited support to modelling time-related features. There is a ‘time¬ 
out’ event of the form on-time-out (E,T) which defines a new event which occurs after a 
duration whose value equals that of the integer expression T after the occurrence of the 
event E. A second construct is that of scheduled activity denoted by schedule (G, T) which 
schedules G, after T time instants from the current instant. In addition, both synchronous 
and asynchronous simulation models are possible for statecharts. 

The formalism of statecharts has not yet been standardised and many other features exist 
and are currently being examined for implementation. Such an implementation, called 
STATEMATE has been reported (Harel et al 1990; iLOGIX corporation). The formal 
semantics of statecharts are given by Harel et al (1987). Application of statecharts has 
been reported in several fields such as hardware description and synthesis (Drusinsky & 
Harel 1989), aspects of hardware implementation (Drusinsky 1991) and application of 
the software STATEMATE to real problems (Harel et al 1990). A special form of the 
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statecharts formalism has been recently introduced for tackling control and observation 
problems (Brave & Heymann 1993). 

In summary, the statecharts formalism is easy to use and supports many user-friendly 
features for modelling. 

3.3 Timed transition models (TTM) 

This is also a state-based formalism having some similarity with FSMs. Two important 
distinguishing features of the framework are admission of activity and data variables, and 
the introduction of lower and upper time bounds on state transitions. These extensions 
of a state machine have been made to be able to use Real-time Temporal Logic (RTTL) 
(Manna & Pnueli 1982; Thistle & Wonham 1986) to specify real-time behaviour that can 
be automatically verified (Clarke et al 1986). Therefore, while some compression of the 
state space is possible over ordinary FSMs, still the formalism inherently suffers from 
weaknesses of FSMs, such as lack of support for hierarchy and very large state space, 
as well as enjoys its strengths such as decidability of many properties and existence of 
verification algorithms for them. However a TTM can have infinite states mainly due to 
data variables. 

A TTM is basically composed of three entities, namely, a set of variables which define 
the state of a process, a set of state transitions or events and an initial condition. An event, 
in turn, is parameterized by four quantities. They are : 

• a boolean enabling condition function which is evaluated at every state and if the con¬ 
dition holds true, the associated transition is enabled. 

• a state transition function that defines the transition corresponding to the event 

• a lower time-bound which prevents the event from taking place before this limit is 
reached. 

• an upper time-bound, which when reached, forces the transition to occur. 

In the original TTM formalism (Ostroff & Wonham 1990), the state transition map 
(<2 -* 2 fi , Q being the state space), is allowed to be nondeterministic. Though its domain 
is left unspecified, it effectively gets determined by the enabling condition. To measure 
time there is a special event called tick. Only this event increments the time data variable 
t, usually an integer. The time bounds of all other events are specified in terms of t. In a 
departure from standard FSMs, non-unique initial states are admitted through the boolean 
valued initial condition. Thus any state q is a valid initial state, provided ©(g) is true, 
where © is the initial condition expression. Concurrency is supported in the formalism. 
Two identically named events r = [e\, h\, u\, /i], in TTM M\ and r = [e 2 ^2 “2 h\, 
in TTM M 2 , where e, stands for the enabling condition, hi stands for the state transition 
function, and «/ and /; stand for the upper and lower time bounds respectively, may be 
composed as a shared event t in synchronous operation of M\ and M 2 as 

T — [e\/\e2, h\ohi, min(ui,M2), max(/i,/2)]. 

Nondeterminism may arise if the two functions h 1 and /Z 2 are non-commutative. This issue 
however, has not been dealt with. Additionally, explicit interprocess communication has 
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been supported in the syntax through synchronized send and receive events along specific 
channels , much in the same way as proposed in CSP (Hoare 1985). To give an example, the 
two events c!x(send) and clx (receive) occurring synchronously in two different processes 
indicate a communication between them involving the current value of x through channel 
c. 

Communication can also be used for sending and receiving commands. This feature 
may be used to model the forcing of a transition by a supervisor process. Note that at any 
state of the process, a number of events may be enabled, i.e., their enabling conditions may 
evaluate to true. Thereupon an internal choice mechanism, left unmodelled, determines 
the occurrence of a specified event. Events are still asynchronous and in the overall evenl 
trajectory they may be interleaved in arbitrary fashion, obeying the timing constraints, with 
‘tick’ sequences. 

To summarize, the main distinctive features of the TTM are the variables as well as the 
upper and lower time bounds. The time bounds may be used to answer questions relatec 
to worst/best case timings. Control of TTMs is achieved by synchronization, sharing ol 
events or by command communications. Specification of timed behaviour may be giver 
using RTTL (Ostroff & Wonham 1990). Decidability of temporal logic specifications foi 
DES has been proved (Knight & Passino 1990; Ostroff 1990). Recently, a modified fora 
of this formalism has been used to verify correct operation of real-time control system* 
(Lawford & Wonham 1995). 

3.4 Petri Net models 

Petri Nets (PN) are one of the oldest and most popular modelling frameworks for discrete 
event dynamic systems. Ever since its introduction in C A Petri’s doctoral dissertation (Petr 
1962), it has been widely used along with its many variants, for modelling a large variety 
of systems such as computer software and hardware, distributed database systems, com 
munciation protocols, industrial process control, formal languages, flexible manufacturing 
systems, etc. The usefulness of this modelling framework is that, apart from answering 
many logical questions, it can also address issues related to performance of the model. / 
number of tutorial and survey papers on this topic have appreared recently in the literature 
(Murata 1989; David & Alla 1994) along with some books (Peterson 1981). 

A Petri Net can be viewed as a modelling tool with both graphical and mathematica 
features. Graphical nature helps in the visual interpretation of the model, while the mathe 
matical aspect allows easy analysis. This fact, combined with the descriptive power of th< 
nets, makes it a very attractive tool for both practitioners as well as theoreticians. 

A Petri Net, in its basic form, is a directed graph with two kinds of nodes, namely 
places and transitions. A place is denoted by a circle, while a transition is represente< 
by a rectangular box or a bar. The arcs are directed and connect places to transitions o 
transitions to places. That is, the structure of the Petri Net is a bipartite graph. See figure i 
for an illustration. The pp-s denote places while t/’-s stand for the transitions. 

To each place is associated a nonnegative variable whose value is equal to the numbe 
of marks or tokens at any instant. The number of tokens can correspond to some data or th< 
satisfaction of some conditions associated with the place. A marking M is a state vecto 
that captures the distribution of tokens in all the places in the net. A transition (event) ha 
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Figure 8. Example of a Petri Net. 

a certain number of input places. The token distribution changes, when a certain transition 
fires, based on the following rules. 

(i) A transition can fire only if it is enabled. It is said to be enabled, when each of its input 
places contains at least one token. 

(ii) When a transition fires, each of the input places loses one token, while each of the 
output places gains one. 

Note that, merely the satisfaction of condition (i) does not ensure that the transition will 
fire. 

A major advantage of the PN framework is the associated support for analysis of many 
properties and problems. Some properties, known as behavioural properties, depend on the 
initial marking Mq. Another type of properties depends only on the topological structure. 
These are known as structural properties. 

Among the behavioural properties of Petri Nets, some important ones are described 
below. For further details the reader is referred to Murata (1989). 

• Reachability: A marking M r is said to be reachable from a marking M, if there exists 
a firing sequence which transforms M, to M r . A firing sequence is denoted often as 
s = Mq t n M\ ... t n M n or simply as s = ti ... t n . In this case, M n is reachable 
from Mo, denoted by Mo [5 > M„. The set of all possible markings reachable from Mo 
in a net N is denoted by R(N, Mo) or R(Mq). The set of all possible firing sequences 
from Mo is denoted by L(N, Mo) or L(Mo). The reachability problem is to determine 
whether M n e R(Mq) for a given marking M n in a net (A, Mo). It has been shown that 
this problem is decidable, although of exponential complexity (Murata 1989). 

• Liveness: A marking M, is said to be live if, from all markings reachable from it, any 
transition of the net can be eventually fired, thus ensuring deadlock-free operation. 
Liveness is often a very strong requirement in many systems. Several levels of liveness 
have been defined starting from "dead" to L4-live (or live). See Murata (1989) and the 
references therein for details. 

• Boundedness: A marking M, is said to be l-bounded, if there exists an integer l such 
that each place of the net has at most / tokens for every marking reachable from M;. In 
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particular, a Petri Net is said to be safe if it is 1 -bounded. Since, places in a Petri Net are 
often used to represent buffers and registers, to verify whether it is bounded and safe is 
an important problem. 

• Reversibility: A PN is said to be reversible, if, for each marking M in R(Mq), Mq 
is reachable from M. This is therefore some kind of inverse reachability. Often, the 
reversibility is defined from some home state, rather than the initial state Mq. 

• Persistence: A PN is said to be persistent, if, for any two enabled transitions, the firing 
of one transition will not disable the other. The concept is closely related to conflict 
freeness. 

• Fairness: Two basic concepts of fairness are the following. 

- Bounded fairness. Two transitions t\ and ti are said to be in a bounded fair (or 
B-fair) relation, if the maximum number of times that either one can fire, while the 
other is not firing, is bounded. A net is said to be B-fair if every pair of transitions 
is in a B-fair relation. 

- Unconditional (global) fairness. A firing sequence s is said to be unconditionally or 
globally fair, if it is finite or every transition in the net appears infinitely often in s. 
The net is said to be unconditionally fair if every firing sequence has this property. 

These properties can be systematically verified by representing the net in the form of an 
incidence matrix A = [ay ] which has n-rows corresponding to the number of transitions 
and m-columns for the places. It consists of integer elements, and a typical entry is given 
by 

a u =a± + afj, 

where aj is the number of arcs from transition i to the output place j and is the 

number of arcs to transition i from the input place j. In essence, a y, a~t and atj represent 
the number of tokens removed, added and changed in place j, when transition i fires. 
Transition i is enabled at marking M if and only if aj- < M(j), j = 1,2,... ra, where 
M(j) denotes the number of tokens in place j under marking M. 

Denoting the stage after the kth firing of the net as Mjc, one can write a state equation 

M k = M k .i+A T U k ,k = l,2,... 

where [4 is an n x 1 binary column vector containing exactly one non-zero entry (equal 
to 1) in the ith position corresponding to the transition which has fired at the kth instant. 

The problems of reachability, liveness and boundedness, are fortunately decidable in 
this framework, and algorithms exist for their verification (Murata 1989). 

Structural properties comprise the following. 

• Structural liveness: A PN is said to be structurally live, if there exists a live initial 
marking. 

• Controllability: A net is said to be completely controllable if any marking is reachable 
from any other marking. It can be shown that if a PN with m places is completely 
controllable, then we have rank A— m (Murata 1989). For some special classes of PN, 
this condition is also sufficient. 
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• Structural boundedness: A net is said to be structurally bounded, if it is bounded for any 
finite initial marking Mq. This property is characterized by the existence of an m-vector 
y of positive integers such that Ay < 0 (Murata 1989). A place p in a PN is structurally 
unbounded if there exists a marking Mq and a firing sequence s from Mo such that p is 
unbounded. 

• Conservativeness: A net is said to be (partially) conservative if there exists a positive 
integer y(p) for every (some) place p such that the weighted sum of tokens. Mj = 
Moy = a constant, for every M e R{Mq) and for any fixed initial marking Mq. This 
property can be related to the existence of any m-vector y of positive (or negative) 
integers such that Ay = 0, y ^ 0. 

• Repetitiveness: A PN is said to be (partially) repetitive if there exists a marking Mo and 
a firing sequence s from Mo such that every (some) transition occurs infinitely often in s. 
This property is satisfied iff then exists an n-vector x of positive (nonnegative) integers 
such that A t x > 0, x ^ 0. 

• Consistency: A net is said to be (partially) consistent if there exists a marking Mo and a 
firing sequence s from Mq back to Mo such that every (some) transition occurs at least 
once in s. The necessary and sufficient condition for consistency is the existence of an 
n-vector x of positive (nonnegative) integers such that A f x = 0, x 0. 

• S- and T-invariance: An m-vector y (n-vector x) of integers is called an S-invariant 
(T-invariant) if Ay = 0(A T x = 0). 

• Structural B-fairness: This is an extension of the B-fairness property to any initial 
marking. 

There are several abbreviations and extensions of the basic Petri Nets. Abbreviations 
are simplified graphical representations having one-to-one correspondence with the basic 
net and, naturally, the same descriptive power. Examples of abbreviations are generalized 
PN, where positive integral weights are associated to the arcs, coloured PN, in which 
each token has an identification (colour) and finite capacity PN, where input transitions 
cannot be fired if that causes a prespecified token holding capacity of an output place to 
be exceeded. 

Among several extensions to the basic PN are priority PN, where there exists a partial 
order relation on the net transitions and continuous PN, in which the marking of a place is 
a positive real number and not restricted to be an integer. 

All the types of PN discussed above are known as autonomous, which can deal with 
only logical questions related to the net. Extensions of Petri Nets, which allow one to 
address timing and performance related issues are known as non-autonomous. Among 
them Synchronized Petri Net is one where the firing of transitions are synchronized with 
external events. A transition will be fired if it is enabled and the associated event occurs. 

Timed Petri Net is useful for evaluating the performance of a system by associating a 
timing either with the places (P-timed) or the transitions (T-timed). In P-timed PN, tokens 
become available at a place pi , only after a certain time di has elapsed after its deposition. 
Normally, the succeeding transition occurs as soon as all its input places have available 
tokens, unless some conflict occurs. The T-timed PN is the dual of this. 
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Interpreted Petri Nets allow modelling of logical controllers and real-time systems and 
have the following features: 

(i) They are synchronized. 

(ii) They are P-timed. 

(iii) They have a data-processing feature where the state vectors can be modified in arbi¬ 
trary ways depending upon the events taking place. A transition can be fired, only if 
it is enabled and a certain boolean condition associated with it is true. 

Stochastic Petri Net is a generalization of T-timed PN, where a random time is associated 
with the firing of a transition. It is generally assumed that the timings are exponentially 
distributed. This allows analysis of performance in a probabilistic setting using tools such 
as Markov Chains, Queueing Theory etc. 

The Petri Nets have the advantage that before quantitative evaluation is performed, the 
qualitative analysis is possible. The disadvantage is that the PN is not a structured modelling 
tool, such as some structured languages and Statecharts. However, different approaches 
have recently been suggested for systematic model building using PN (Ferrarini 1992; 
Zhou et al 1993; Ausfelder et al 1994; Ferrarini et al 1994). 

3.5 Algebraic models 

The process algebra models aim at describing complex DES in terms of an algebra con¬ 
sisting of a few basic terms, which can be related to some fundamental processes, and 
a number of operators, which can be used to form complex terms (processes) from sim¬ 
pler ones (Hennessy 1988). The basic terms represent some simple DES behaviour. The 
different operators arise naturally from the physical fact that simpler DESs indeed oper¬ 
ate concurrently or sequentially producing complex behaviours. Such models have been 
investigated for nearly two decades for providing clear mathematical semantics to pro¬ 
gramming languages. The developed theory is general enough to model DES dynamics. 
Among different variants of this technique, the finitely recursive process (FRP) model 
based on the framework of communicating sequential processes (CSP) has been proposed 
recently (Inan & Varaiya 1988) 

The central idea in FRP is a process. A process describes the behaviour of a DES 
in terms of three components: alphabet, trace and termination. Formally speaking, P = 
(aP, trP , xP). The alphabet of P, aP, is a set of event symbols, each of which is the 
abstraction of a physical event which is of concern in the dynamics of the physical process 
that is intended to be modelled by P. The trace of P, namely trP, which is a subset of 
(a P)*, captures the set of event trajectories that can ‘take place’ in the physical process 
described by P. The termination function rP indicates whether a process has terminated 
successfully or not. 

The basic processes of this framework are STOP a and SKIP a- Each of them is a do 
nothing process, i.e., they generate empty traces. The difference between them is that 
while the former represents a deadlocked situation or unsuccessful termination, the latter 
represents a successful termination. 

STOP a := {trSTOP A :={(>}, aSTOP A (()) := A, tSTOP A := 0 ) 
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SKIP A := 0 trSKIP A := {< >}, ocSKIP A (( )) := A, vSKIP A := D 
The important operators are as below. 

• Deterministic choice operator (DCO): Given P\, ■ ■ ■, P n and {°"i* • • • > °n} — P = 
(crj -» Pj I • • • | a„ -* Pn) a, r is defined as follows: 

If r = 1, 


P := SKIPa- 


If r = 0 then 

trP := {{)} U {(cr/Ti | 5 e tr Pi, 1 < i <n), 
aP(( )) := A, aP{(<n)~s) := aP,(s), 
tP ({)) := 0, xP((<riys) := tP;(s). 

It says that the initial set of events that can take place in P are from the set {ai, • • • ,<r„} 
and after occurrence of cr, the process P starts behaving as P,. As an example, consider 
the process P = (a -» P \ b -+ SKIP {) ) [a , b ),o which generates the language trP = 

{a l b\i > 0}. 

• Sequential composition operator (SCO): Given Pi and P 2 , Pi; Pi is defined as follows: 

a(Pi; P 2 )(()) := (aft«)) if fPi(( » = 0) v (aft(()) 

if tPi« )) = 1), 

r(Pi;ft)((»:=(0 ifrPi((»=0)v(rP 2 ((» if rPi(()) = 1), 


(Pi; Pz)/(°) ■■= 


(Pl/(cr>); P 2 , if (<r) err Pi, 
ft/(<r), iftPi(()) = 1 A (cr) 6 rrP 2 , 
undefined, otherwise. 




Here P initially behaves as Pi and after Pi terminates successfully, it behaves as P 2 . 
Consider the processes Q =■ (a. Q\ R \ b -*■ SKIP{}){ a ,b}$ ^ P = (a -> 


SP/P{}){ a },o- Then tr Q = > 0}. 

Parallel composition operator (PCO): Given Pi and P 2 , Pi 11 ft is defined as follows: 


a(Pi II ft)((» :=«Pl(())Uaft(()). 


t(Pi II ft)((» := 


1 , 

0 , 


if 


(TPl(()) = lATft(()) = l)V, 

(rPi(()) = 1 A«ft(( )) C aPi(( )))V, 

(Tft(()) = lAaPi(())Caft(())), 


otherwise. 


(Pi II ft)/«<r» := 

• (Pi/(tr)) || ft, if (or) e trP\ A a £aP 2 (()), 

(P 2 /(ct)) || Pi, if (or) e rrP 2 A cr ftaft({)), 

(Pl/(a) || (ft/(cr)), if (^) € rrPi nrrP 2 , 

undefined, otherwise. 
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It represents the concurrent behaviour of P\ and Pi- Any arbitrary interleaving of event 
sequences from Pi and Pi can take place in P except that any event belonging to their 
common alphabet must take place synchronously in both components, and it will be 
shown as a single event in the trace of P. 

Due to variable alphabet, unlike that of CSP, there normally does not exist a global 
set of synchronised events in parallel composition. However using the global change 
operator, constant synchronizations can be modelled. Consider the processes 5 = 
(a -> 5 I b S KIP { } ) { a,fe )t o and T = (a -► SKIP [b] ) [al0 . Then tr(S || T) = 

{a 1 | i > 0} U {ba}. 

• Local change operator (LCO): Given a process P and collection of events B and C, the 
LCO pl-^+ci js defined as follows. 

Q ,p[-B+C](( ^ )) _ B) U C, 

T p[-S+C] ({ )):=r p ({) ) 5 

p[-B+C],,y = \ (P/(v), if (cr) etrP Act <£B), 

' 1 undefined, otherwise. 

As an example, note that, for the process P used in the example of DCO.tr P = 

W | i > 0}. 

• Global change operator (GCO): Given a process P and collection of events B and C, 
the GCO pd-£+C]j i s defined as follows. 

apll-B+Ci 1((» . = (a p«)) _ B ) U C, 

T pll-B+C]] ({));=T p ((})i 

P [[-B+C]] (P/(or»[t- s+c l], if (a) etrP act e/B), 

\ undefined, otherwise. 

Again note that, for the process P used in the example of DCO, p[[ _ ^l + f c hl = Z = 
(a Z ){ aiC }'o 

A process Y is said to be a Finitely Recursive Process (FRP) if it can be represented as: 
X = F(X), Y = g(X) 

where X = (Xi,..., X„), a vector of processes, X is of the form 

P,-(X) = (a/, -► f h (X) | • • • \a ik . -> f ik . (X)) a ,., r ., 

and fij and g are composed of the operators PCO, SCO, LCO and GCO. In the example 
section we have shown how to model the dynamics of a manufacturing cell using such 
recursive equations. 

The framework in this form lacks certain modelling flexibilities such as absence of 
timing features, data variables etc. Recently this has been extended to incorporate these 
features (Bose & Mukhopadhyay 1995). A nondeterministic extension has also been pro¬ 
posed (Bose et al 1995b). Unfortunately, certain key properties like boundedness, reacha¬ 
bility etc., become undecidable in this framework. This arises because of unrestricted use 
of both parallel and sequential composition operators. For maintaining tractability, it is 
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necessary to restrict them in some fashion so that one can find computational algorithms 
for solving specific problems. Some initial work in this direction has been reported (Bose 
etal 1995a, c). 


4. An example 

In this section we consider an automated manufacturing system taken from Zhou et al 
(1992) for modelling using the frameworks of FRP, Statechart and TTM. A Petri Net 
model, already given by Zhou et al (1992), is presented for comparison. An FSM model 
is shown to have too large a state space to be described here. 

The layout of the automated manufacturing plant is shown in figure 9. It consists of four 
machines, two robots, two buffers of capacity b\ and &2 respectively and an assembly cell. 
The system first processes two types of parts, called F and G from common raw materials, 
and then assembles these parts pair by pair. An F-part is first processed by machine 1, 
then it goes to buffer 1, and is finally processed by machine 3. Machine unloading and 
transfer operations are conducted with the help of robot 1. The same generic procedure is 
used to produce a G part as shown on the right side of figure 9, i.e., machine 2, buffer 2, 
machine 4 and robot 2 for unloading and transfer. The system requires that raw material 


Raw material 



Final 1 
product exit ' 


Figure 9. Layout of an automated 
manufacturing plant. 
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Figure 10. Petri Net model of the 
plant. 


for F-part will go to the machine 1 first, and then raw material for G-part goes to the 2nd 
machine. Finally., at the assembly stage, an F-part is required to enter the assembly station 
first, followed by a G-part. Robot 2 is responsible for the assembly process and delivery 
of the final product to the output area. 

4.1 PN model 

The PN model is taken from Zhou et al (1992) and is shown in figure 10. In the figure, 
mo (Pi ) denotes the initial marking of the place pi . If unspecified, it is assumed to be zero. 
Note how the places correspond to the various states of activity. When a token is available in 
a place, the corresponding activity occurs. Since many places can simultaneously contain a 
token, those activities go on concurrently. The transitions represent start or end of operation, 
which cause changes of state. The places p\, P 2 , P 3 model the states of the controlling 
mechanism to ensure that t\ fires first and then ti and t\ fire alternately. Places p 4 , p&, 
P%i Pio. Pi 2 > P 18 , P20. P22 along with the related transitions model the production of an 



Logical models of discrete event systems 


705 


F-part. A similar structure corresponds to the G-part. The assembly cell is modelled by 
the places pu to pn along with transitions t\ 2 to t\ 4. The various types of logical analysis 
that can be done using the developed model have been discussed (Zhou et al 1992). 

4.2 FRP model 

While modelling the manufacturing cell using an FRP, we adopt a modular approach. Each 
of the machines, buffers, and robots, the raw material entry point and the assembly cell is 
modelled as a process. The overall system consists of these component subsystems evolving 
concurrently and is modelled by a PCO. The overall sequence is achieved by interleaving, 
with synchronization and blocking among events from the different subsystems. The names 
of the events used in the model are self-explanatory. 

• Machine-1: 

M_1 = {load-F -> M-l'){i oa d_F},0 

M-V = ( tr-F-B-pick -> M-l)[ tr _F-B-pick),0 

■ Machine-2: 

M-2 = ( load-G -*■ M-2'){i oa d-G),0 

M- 2! = ( tr-G-B-pick ->• M-2){ tr _G-B-pick),0 

■ Raw material entry point: 

RM — {load-F — RM'){i oa d-F},0 
RM' = ( load-G RM){i 0 ad-G),o 

The RM process, by synchronization, ensures that an F part in M 1 and a G part in M2 
are processed alternately. 

- Buffer-1 : 

Bl-E = {tr-F-B-drop -» Bl-l){ tr _F-B-drop},0 
B 1_1 = {tr-F-B-drop -+ Bl-2 

| load-M -3 -+ Bl-E){ tr _F-B-dropJoad-M-3},0 


Bl-bi 


1 = {tr-F-B. 
I load-M-3 


Bl-F = {load-M -3 
• Buffer-2: 


-drop Bl-F 

Bl—b\ — 2){ tr _F-B-drop,load-M r 

* R1 Ti - rd+ltr-F-B-pick}] 

► Bl-b\ - l)\i oa d_M-3}fi 


3),0 


B2-E = {tr-G-B-drop -> B2-l){ tr _c_B-drop},0 
B2-1 = {tr-G-B-drop -+ B2-2 

| load-M- 4-> B2-E){ tr _G-B-drop,load-M-A),0 


B2-b2 — 1 = {tr-G-B-drop -» B2- F 

| load-M- 4 -► B2-bo — 2){ tr _G-B-dropJoad-M-d},0 
B2-F = {load-M-A B2_b 2 - 


Note that, due to lack of variables, for each buffer state a process is defined and if the 
buffer capacity is large, the model becomes cumbersome. Also it is to be noted that robots 
are prevented from picking up objects from machines when the respective buffers are full 
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and this has been.achieved using an LCO. 

• Machine-3: 

M- 3 = (load-M- 3 -> M-3'){i oa d_M-2}, o 
M- 3’ = ( tr-F-A-pick M-3) {tr _ F _A-pick),0 

■ Machine-4: 

M-4 = (load-M- 4 -> M_4'){/ O a^_M_4),0 
M-4' - (tr-G-A-pick M-4){ tr _ G _A-pick},0 
Machines 3 and 4 are similar to the machines 1 and 2 respectively. 

• Assembly cell: 

AC-E = ( tr-F-A-drop AC-F)[ tr _ F _ A _ drop]t0 

AC-F = (tr-G-A-drop -+ AC-FG) 

-pick}] 


AC-FG — (st-assm -> AC_W) 


{rr_G_A_dr<9/?},0 
[-j-{?r_F_A_pzc^,rr_G-A- 


{sf_<3.ssm},0 


4r p\l+{0'_.F_/l_pjc£}] 
/lc ~ CJ {/in_aijm},0 


AC_W = (firi-assm 

In the assembly cell process, again, using LCO, robot 1 is prevented from picking up a 
part from M3 when the assembly cell is not empty. Similarly robot 2 cannot pick another 
part from M4 after it has dropped one G-part in the assembly cell till the assembly operation 
is completed. Otherwise a deadlocked situation will arise. 

• Robot-1: 


R-l-I = (tr-F-B-pick -> R_1_W1 

| tr-F-A-pick ->• R-l-W2)[ tr _ F _ B -pick,tr-F-A-pick},0 
R_1_W1 = (tr-F-B-drop -* R-l-/){r r _F_s_^},o 
R-1-W2 = (tr-F-A-drop R-l-/)p r _F_A_<irop},0 

• Robot-2: 

R-2-I = (i tr-G-B-pick -+ R_2_W1 | tr-G-A-pick R-2-W2 
| St-aSSm —> R—2—W3){tr-G-B-pick,tr-G-A-pick,st~ a ssm},0 
R-2-W1 = ( tr-G-B-drop -> R.-2-I)( tr _ G -B-drop},0 
R-2-W2= (tr-G-A-drop R-2_/) {ir _ G _ A _^ ro/ , }>0 
R-2-W3 — (firi-assm R—2— I}^pi n _ assm ^Q 
Robot 1 has two working states, representing the transfer of parts from machine to 
buffer and machine to assembly cell. Robot 2 has an extra working state representing the 
assembly of F and G parts in the assembly. 

Also define 


AmC-1 — { load-F , tr-F-B-pick} 

AmC-2 = {load-G, tr-G-B-pick} 

AmC-3 = {load-M- 3, tr-F-A-pick } 

AmC-4 = {/oarf-Af_4, tr-G-A-pick } 

Ab_i = {load—M-3, tr-F-B-drop] 

Ag-2 = f/oad-M-4, tr-G-B-drop} 

A#_i = {tr-F-B-pick, tr-F-B-drop, tr-F-A-pick, tr-F-A-drop} 
Ar _2 = {tr-G-B-pick, tr-G-B-drop, tr-G-A-pick, tr-G-A-drop, 
st-assm, fin-assm} 

Arm — {load-F, load—G } 

A^c = {tr-F-A-drop, tr-G-A-drop, st-assm, fin-assm } 



Now the dynamics of the overall manufacturing cell is given as • 

M.Cell = (M_l) [[+A " c - l]] I! (M-2) [1+Amc - 2]] || (M_3) [[+a " c - 3]] || 

(M_4) [[+Amc - 4]] || (£_1_£) [[+Ab - i]1 || {B-2-E) [[+Ab - i]] || (/?_l_/) [[+A *- ll] || 
(R-2-I) 1[+Ar - 2]] || || {AC-E) [[+Aac]] . 

Note the use of GCO in achieving a constant synchronizing alphabet while keeping the 
core process description modular and independent. 


4.3 SC model 

The same set of event symbols that are used in FRP, can be used in SC Model also. The 
formalism, being a graphical one is shown in figure 11. In table 1 we present the list of 
events and associated event conditions. The event symbols shown in the table are used in 
the figure. 

Note that the earlier problem of modelling of buffers with large capacity remains, per¬ 
haps more so in a graphic environment. Otherwise the graphical nature and the event 
conditions based on active state of various subcomponents renders modelling easy. How¬ 
ever, providing the graphical support is a problem in itself. Moreover there is no clue as to 
how the event conditions will be verified in an actual physical implementation. Thus this 
description is at a somewhat higher level than that of FRPs or TTMs. 
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Figure 11. Statechart model of the plant. 
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Table 1. Event names, symbols and associated conditions for the statechart model. 


Event name 

Event condition 

Event symbol 

load-F 

F-avlbl 

a 

load-G 

G-avlbl 

b 

tr-F-B-pick 

R\-I A MI-Wa ~ Bl-F 

c 

tr-G-B-pick 

R2-I A M2-Wa ~ B2-F 

d 

tr-F-B-drop 

Rl-Wl 

e 

tr-G-B-drop 

R2-W\ 

f 

load-M-3 

(~ Bl-E) a M3-I 

g 

load- M-4 

(~ B2-E) A M4-1 

h 

tr-F-A-pick 

Rl-I A M3-W a AC-E 

i 

tr-G-A-pick 

R2-I A M4-W A AC-F 

j 

tr-F-A-drop 

R1-W2 

k 

tr-G-A-drop 

R2-W2 

1 

st-assm 

R2-I A AC-FG 

m 

fin-assm 

true 

n 


4.4 TTM model 

The TTM model is shown in figure 12. Here again the same modular approach as that of 
FRP and SC is used. All components are modelled individually as TTMs and the overall 
system is the parallel composition of individual TTMs. Even though TTM is able to model 
real time constraints, in the present example we have ignored the real time behaviour of 
the system and l T and u r have been chosen as 0 and oo respectively for all transitions. 
The initial and tick transitions are common to all the components. The initial transition 
does not change any state and just initiates the event sequence of a TTM. And tick is 
the only transition that increments the global clock variable by one unit. Same event 
symbols as given in the table of statechart are used. However we also used two pairs of 
communications and two more transitions, namely Check—c\ (symbol o) and Check—cl 
(symbol p). 

• Raw material entry point (RM): 

VrM = {xrm, V, t} 

Trm = { load-F, load-G, initial, tick}' 

®RM = 07 = initial ax rm = F av i b[ ) 


T 

e t 

hr 

load-F 

XRM = Favlbl 

XRM := Gavlbl 

load-G 

XRM = Gavlbl 

XRM := Favlbl 


• Machine-1 (Ml): 

Vmi = {xm\i 7T } 

Tm\ = { load-F , tr-F-B-pick, initial, tick} 
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Figure 12. Timed transition model of the plant. 
©Ml = 0? = initial Axmi = I) 


T 

. 

hr 

load-F 

= / 

X M\ ■= W 

tr-F-B-pick 

= W 

X M\ '■= I 


• Machine-2 (M2): 

VM2 = {*M2, V , f} 

Tm2 = {load-G, tr-G-B-pick, initial , 
©M2 = (*? = initial axm2 = -0 


X 

ex 

hr 

load-G 

X M2 = I 

X M2 ■= W 

tr-G~B-.pick 

X M2 = W 

X M2 ■= l 
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•Buffer-1 (Bl): 

Vbi = {*51, y , *7, t} 

Tb\ = {load-M 3, tr-F-B-drop , cIJjc^i, initial , fzcfc} 
©51 = 07 = initial A x#i = £ A y = 0) 


T 

e x 

hr 

cllxgl 

true 

[] 

load-M3 

~ (xgi = P) 

y :=y-i; 

(1 < y < fci) =» X B 1 := PF; 

(y = 0) =» Xgi := £ 

tr-F-B-drop 

~ 1 = F) 

y :=y + i; 

(1 < y < ^ 1 ) => xgi := PP; 

(y = * 1 ) => := f 


• Buffer-2 (B2): 

V52 = {*52, Z, V, t) 

Tb2 = {/oad_M4, tr-G-B-drop, c2\xb 2> initial , f/cfc} 
©52 = (5 = initial axb2 = E a z = 0) 


r 

e r 

h T 

c2!x52 

true 

[] 

load^MA 

~ (x B2 = £) 

z := z — 1; 

(1 < z < bi) =$■ xj32 := PP; 

(z = 0) =» xsi := P 

tr-G-B-drop 

~ (XS2 = P) 

z := z 4- 1; 

(1 < z < &i) =*► x B 2 := 

(z = Z?2) =r> *52 «* = T 7 


The important features to be noted here are: (i) use of data variables in buffer definitions 
which results in a compact description; (ii) use of communication channels, through which 
buffers are always ready to communicate their states, and (iii) piecewise constant nature 
of transformation functions defined in the “Buffer” TTMs. 

• Machine-3 (M3): 

Vm3 = {*M3, 5, 

Tm 3 = {Ztf<2<i_M3, tr-.F-A-.pick , initial , ric&} 

©M3 = (5 = initial axmi = /) 


r 

e r 

hr 

load-M3 

XyVf3 = / 

XM3 := W 

tr-F-A-pick 

XAf3 = W 

XM3 '•= / 
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• Machine-4 (M4): 

Vm4 = { x M4i T), t} 

T M4 — {/^J_M4, tr-G-A-pick , initial , £zcfc} 
©M4 = (?7 = initial A jc ^4 = /) 


T 

e x 

h T 

load-M4 

*M\ = / 

x M a := W 

tr-G-A-pick 

= W 

*A/4 := / 


As in FRP and SC, the four machine models possess similar (discrete) dynamics. 
• Robot-1 (Rl): 

V/n = *?* 

7ri = {tr-F-B-pick, tr-F-B-drop, tr-F~.A-.pich , tr-F-A-drop , 

Check-c 1 , c\lqB\, initial , /zc/:} 

0#1 = (rj = initial A = /) 


r 

e r 

hr 

Check-c 1 

XRl = A Aq B ] — F 

xri := / 

cl??Sl 

XRI = / 

(<?bi := xbi) a (x/?i := A) 

tr-F-B-pick 

(*tfl = A)A ~ (951 = F) 

:= W 1 

tr-F-B-drop 

II 

:= / 

tr-F-A-pick 

(*fli = Av xri = I) 

A (x A C = E) 

xri := W 2 

tr-F-A-drop 

xri = W2 

x*i := / 


• Robot-2 (R2): 

Vtf2 = <?£2, 0 

Tr2 = {tr_G_B_pzc/c, tr-G-B-drop , tr-G-A-pick , tr~.G-A~.drop , 
st-assm , fin-assm, Check-c2, cUqsi, initial , fzc£} 

0^2 = (t? = initial A = /) 


r 

e r 

hr 

Check-c2 

XR2 — A A qB2 = F 

xr2 := I 

c21qs2 

■FR2 = / 

(<?JS2 := *B2) A Utf2 := A) 

tr-G-B-pick 

0/?2 = A)a ~ (^g2 = F) 

*/?2 := Wl 

tr-G-B-drop 

X/?2 = W1 

x*2 := / 

tr-G-A-pick 

XR2 = A V XR2 = / 

x*2 := W2 

tr-G-A-drop 

x*2 = W2 

x/?2 := / 

st-assm 

XR2 = A V XR2 = / 

**2 := W3 

fin-assm 

x* 2 = W3 

x/?2 '•= / 
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Here note that, in R\ and in R2, the local variables qp 1 and qpi get assigned to values 
of corresponding buffer states via communicating events. On the other hand, the activity 
variable of the assembly cell xac is modelled as a shared variable between AC and R2. 
Further, note that, in both robot models, extra state and transitions have become necessary 
because of explicit communication through events. 

• Assembly cell (AC): 

Vac = {xac, n , r) 

7ac = {tr-F-A-drop, tr-G-A-drop, st-assm, fin-assm, initial, tick] 

©AC — (V = initial A xac = E) 


T 

e r 

hr 

tr-F-A-drop 

XAC = E 

x A c ■- F 

tr-G-A-drop 

XAC = F 

XR2 ■= FG 

st-assm 

XAC = FG 

XAC ■= B 

fin-assm 

XAC = B 

XAC := E 


Finally, the manufacturing cell model is given by 
M.Cell = M-\ || M_2 || M_3 || M_4 || B- 1 || S_2 || R- 1 || i?_2 || RM || AC. 

This appears similar to the case of FRP. However the role played by GCOs in the FRP 
model is automatically fulfilled by parallel composition of TTMs, as the transitions sharing 
identical names get composed into a single unique transition, bearing a combined enabling 
condition, under concurrent operation. Such synchronization is however constant unlike 
in FRP where it is dependent on the local alphabet. 

4.5 Remarks 

From the exercise of modelling the automated manufacturing system using different frame¬ 
works, the following points that emerge clearly are mentioned below. 

(i) Synchronization is a major issue in modelling and is tackled differently in different 
models. 

In PN model synchronisation is modelled by enabling a transition with multiple 
input places. So any system having some internal synchronisation has to be modelled 
as a whole, as if it is hardwired. 

In FRP model the PCO models concurrency. The variable alphabet of processes 
is both a power as well as cause of complexity in this model. Note how the LCO 
[+{tr-F-B-pick}] is used in the definition of buffer 1 to prevent the robot 1 from 
picking up object from machine 1, when the buffer is full. Also GCO is used over 
each component in the final definition of the manufacturing cell, to force global 
synchronization among different events. As mentioned before, we keep the basic 
component models unchanged, i.e., they maintain their independent operation and 
impose additional synchronization through LCO and GCO whenever required. This 
is preferable from the point of view of maintainability and reusability of models than 
including the alphabet into the local alphabet of core models everywhere. 
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In SC model, like FRP, mere identity of names does not ensure synchronization. 
However instead of GCO and SCO, state based conditions have been used to ensure 
synchronisation as well as blocking. 

In TTM, however, identically named transitions are destined to occur synchronously. 
This is because such transitions are fused into a new synchronized transition in the 
concurrent combination. As a result, unlike SC, conditions defined on most of the 
component transitions are local. 

(ii) Communication between processes is explicitly modelled only in TTM, using both 
communication via channels (ci and cf) as well as shared variables (xac in V/?i ). 
PN, SC and FRP have no such features. 

(iii) Modelling of the buffer shows interesting differences between the models. In PN the 
presence of ‘tokens’ takes care of the numerical value of buffer capacity. In TTM 
there is provision of explicit data variables. In SC and FRP, however modelling of 
buffer is cumbersome. In SC the number of basic states (under the BiPF, i = 1,2 
superstates) and correspondingly in FRP the number of equations describing the two 
buffers increases with increase of buffer capacity. 

(iv) Hierarchy is explicitly treated in SC framework. In TTM and FRP it is achieved 
via modular design of components and PCO. In the final PN model the presence of 
hierarchy is not apparent. Zhou etalil 992) have shown how the final model is arrived 
at via stepwise refinement from simpler models of PN. This stepwise refinement 
strategy is however not inherent in the PN modelling tools and has to be designed by 
the modeller to suit his purpose. 

(v) Other than TTM and SC, none of the above models can capture real-time features. 
But we have ignored timing features by taking lower and upper time bounds of all the 
transitions as 0 and oo respectively. It may be mentioned that performance questions 
related to timing are difficult to answer analytically and are usually dealt with by 
exhaustive automated simulations. 

(vi) An FSM model for the example becomes unmanageable because of the large size of 
the state space as can easily be understood from the following argument. 

From the statecharts model in figure 11, we note that for buffers of capacity four 
only, the product automata with no synchronization will have 2x2x2x2x5x5x 
3x4x2x2x4 = 38400 states! However due to synchronization present a fraction 
of this only will exist in the concurrent behaviour. It is difficult to estimate the exact 
number of states manually, but even if only about ten percent of the states remain in 
the final model, it would be a staggering 4000. Naturally, the FSM example has not 
been provided here. 

5. Comparison and conclusions 

5.1 Comparison 

We have come across various types of modelling frameworks, each of which has certain 

strengths as well as weaknesses. The more powerful a modelling framework is, the more 
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difficult it is to answer questions of reachability, boundedness, etc. in that setup. In fact, 
many such aspects of the more powerful models are undecidable. In order to get an overview 
of the various modelling frameworks, we present a table at the end of this section comparing 
them on the basis of the answers to the following questions: 

Q1. Is the reachability problem decidable? 

Q2. Is the liveness/deadlock problem decidable? 

Q3. What is the language complexity? 

Q4. Can it handle nondeterministic behaviour? 

Q5. Does it support hierarchical structure? 

Q6. Is the problem of control well-solved? 

Q7. Is the problem of observation well-solved? 

Q8. Is the problem of stabilization well-solved? 

Q9. Can real-time features be handled? 

Q10. Do there exist good analytical tools for analysing the model? 

The answers in most cases are Y (Yes), QY (Qualified Yes), N (No) and ? (Unknown). 
Question 3 has however the following possible answers: R (Regular), CF (Context free), 
CS (Context sensitive) and RE (Recursively enumerable). The frameworks, which are 
compared, are the FSM, TTM, PN, Statecharts and FRP. Since we are often concerned 
with finite state versions of the first four systems, most of the questions related to them 
are decidable. Things are exactly the opposite for the last three. Activities on the control, 
and stabilization are till now restricted only to the first four, although not yet well-solved 
in all of them. Hierarchy is explicitly supported inherently only in the Statecharts model, 
while real time features are inherent only in the TTM. Such features have however been 
imposed on some of the other frameworks as well. Finally, graph and set-theoretic tools 
are available for the first four frameworks, in addition to temporal logic for TTM, while 
for the last three only logic and proof rules can be applied. 


Model 

Ql 

Q2 

Q3 

Q4 

Q5 

Q6 

Q7 

Q8 

Q9 

Q10 

FSM 

Y 

Y 

R 

Y 

N 

Y 

Y 

Y 

QY 

Y 

TTM 

QY 

QY 

RE 

Y 

QY 

QY 

N 

N 

Y 

Y 

PN 

Y 

Y 

RE 

Y 

N 

Y 

QY 

QY 

Y 

Y 

SC 

Y 

QY 

? 

Y 

Y 

QY 

QY 

QY 

QY 

Y 

FRP 

N 

N 

RE 

N 

QY 

N 

N 

N 

QY 

QY 


5.2 Conclusion and future scope 

It is too early to say anything definitive about the future of DES research. While efforts are 
on to equip existing frameworks with new modelling features, a single model is unlikely 
to be appropriate for all situations. Experience of modelling large real-life systems is still 
lacking. Further developments in this field should be driven by practical problems. 
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Some of the major issues that need to be addressed are classified as follows: 

• Implementation : (i) How to handle the myriad of information about a complex system 
and make the necessary level of abstraction, (ii) How to move up and down in the abstrac¬ 
tion hierarchy and automatically generate low level details from high level specification 
and other data, (iii) How to represent the derived model in the most appropriate form in 
the computer, so that it facilitates the use of the encoded information in some optimal 
way. (iv) How to simulate model behaviour. 

• Specification and verification : (ij How to check whether the behaviour of the model 
conforms to certain hypotheses, (ii) How to specify the hypotheses themselves so that 
it becomes easy to verify them. 

• Control : If the system does not satisfy the requirement, how to change its behaviour so 
that it does. 

• Observation : (i) How to estimate system behaviour in the face of partial observation, 
(ii) How to control such a situation, (iii) What is the role of nondeterminism? 

These questions, and many others, we feel, should keep the field active for at least some 

years to come. 
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Abstract. We present the architecture of a second-generation expert system 
for the automated design of microprocessor-based systems. A novel feature 
is the integration of a device handbook knowledge base with a shallow ex¬ 
pert system, to provide resilience and deep reasoning capability. The design 
tasks, knowledge sources, behaviour modelling scheme, behaviour mapping 
algorithms, and inter-layer communication are briefly described. 
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soning; device modelling; incomplete knowledge. 


1. Introduction 

Today, microprocessor-based systems are widely used in industrial applications because 
of their low cost, programmability, and ready availability of supporting peripherals. In 
contrast, complete hardware solutions (e.g. ASIC-based) have much higher costs and 
design periods, and complete software solutions are often not capable of supporting the 
real-time constraints of application. 

Computer-aided design of microprocessor-based systems require the design of the hard¬ 
ware configuration along with the design of application-specific software and the device 
drivers. Design automation research in the last decade has primarily concentrated on hard¬ 
ware design or software design, but has neglected the key area of cooperative design of 
hardware and software. But the increasing demand from the industry for lower design 
periods has forced the international research community to explore methodologies for 
automating the codesign of mixed (hardware-software) systems (Srivastava & Brodersen 
1991; Kalavade & Lee 1993; Kumar et al 1993; Smailagic & Siewiorek 1993; Chou et al 
1994; Hu et al 1994) during the last three years. 
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For designing microprocessor-based systems, it is essential to incorporate the capability 
of adequate semantic requirement interpretation of problem specification as well as that 
of the external environment with which the target system is supposed to interact. Only 
semantic interpretation, for example as in boolean synthesis, is not sufficient for this 
domain. Hence, a knowledge-based approach is the appropriate paradigm for designing 
such systems. This approach also facilitates the encoding of design heuristics in the form of 
situation-action rules. Such heuristics have been traditionally used by human experts. An 
explicit recourse to all this domain knowledge for solving each and every design problem 
turns out to be very costly and impractical. In this perspective, the expert system approach 
provides a flexible solution to this problem. 

However, it is well known that the power of a knowledge-based system is very much 
dependent on the extent of the knowledge encoded in the system. Although the shal¬ 
low knowledge-based systems perform reasonably well when the solution lies within the 
boundary of the encoded knowledge, they fail to generate a solution if the association 
between the relevant problem subgoal and the desired solution has not been encoded in 
the system. In contrast, second generation expert systems (Keravnou & Washbrook 1989) 
resort to deep (causal or behavioural) knowledge when shallow knowledge fails to find a 
solution. Such an integration of deep knowledge with shallow knowledge is necessary to 
make a robust knowledge-based design system. 

The usage of second generation expert systems is of special relevance in the domain 
of microprocessor-based systems, where the repertoire of devices is growing at a regu¬ 
lar pace. When a human designer fails to achieve an elegant solution with the devices 
known to his experience, he/she consults device manuals and application notes to find 
new devices which can solve the desired design subgoal. The design expert system should 
be able to mimic the human designer’s capability of analysing the behaviour of the de¬ 
sired subgoal and the available devices and to generate the necessary interface to the 
device. 

The present article is a report on the first Indian effort in this direction. We have developed 
a knowledge-based CAD framework for automating the design of microprocessor-based 
systems. In this paper we propose a knowledge-based system having a novel two-layer 
architecture, the first layer being a shallow layer which synthesizes the target systems 
using precompiled heuristic transformation rules and associated procedures. Whenever 
such precompiled knowledge turns out to be insufficient to solve a design subgoal, the 
second layer - which is the deep layer - is invoked. The deep layer is endowed with the 
behavioural knowledge about the microprocessor peripheral devices and the different de¬ 
sign functions. This layer performs behavioural (model-based) reasoning to find a solution 
for the “failed” subgoal, and returns the result to the shallow layer, which then continues 
with the design tasks. Thus, two-layer architecture provides a resilient environment for 
design. 

In this paper, our emphasis is on presenting the integrated environment, and on the 
communication and cooperation between the two layers. For completeness, we also present, 
briefly, the salient features of the individual layers. Details of the shallow layer can be found 
in Mitra et al (1993,1994), and the description of the behavioural mapping algorithms can 
be found in Mitra et al (1996). 
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2. Related works 

Automated hardware software codesign has been used in a variety of applications: DSP 
(Kalavade & Lee 1993), wearable computers (Smailagic & Siewiorek 1993), automobile 
control (Hu et al 1994), robot control (Srivastava & Brodersen 1991) etc. Two key issues 
in hardware-software codesign are hardware-software partitioning (Kumar et al 1993) 
and hardware-software interface synthesis (Chou et al 1994). A brief survey of the issues 
and approaches in hardware-software codesign is available in Micheli (1994). Hardware- 
software codesign is essentially a system-level design problem, and hence draws on past 
research efforts in high level synthesis (McFarland et al 1990) (especially on the issues 
of partitioning, allocation, scheduling and hardware synthesis), and on those in automated 
synthesis of domain-specific software (Barstow 1985; Jullig 1993). 

Al techniques have been effectively used to solve the complex problem of system de¬ 
sign. Chippe (Brewer & Gajski 1990, 1991) is a hybrid expert system for design, that 
encodes analysis knowledge in the form of rules and implementation knowledge in the 
form of procedures, resulting in fast executions. Other noteworthy CAD systems for the 
design of computer configurations and circuits are MICON (Tseng & Siewiorek 1986), 
R1 (McDermott 1982) and VEXED (Mitchell et al 1985). 

A number of researchers have investigated the issue of integrating the deep level knowl¬ 
edge about the device, in design and diagnosis problems (Keller etal 1990; Keuneke 1991). 
Most of these approaches use a structure function behaviour model of the device as a form 
of deep level knowledge. The structural model is based on the physical organization of the 
components. The function of the device is its intended purpose. The functional specifica¬ 
tion describes the device’s goals at an abstract level. Functions are achieved by behaviours. 
In other words, function is what is expected and behaviour is how this expected result is 
achieved (Keuneke 1991). Behaviour is often represented as a causal sequence of transition 
of partial states. 

3. System overview 

As mentioned earlier, we have developed a two-layer architecture for a knowledge-based 
design system, in order to allow the design process to have the advantages of shallow 
as well as deep reasoning. On the occurrence of a failure in the shallow layer, the deep 
layer is invoked to resolve the failure. If that succeeds, the shallow layer continues with 
its processing, using the data returned by the deep layer. If, however, the deep layer fails 
to generate the required solution, the shallow layer backtracks and tries to find another 
problem decomposition - one that does not contain the failed subgoal. 

The two-layer architecture of our synthesis system is shown in figure 1. The S-layer is 
the expert system (christened MICKEY) that performs the synthesis tasks with the help 
of shallow design knowledge compiled by the human expert. This layer uses precompiled 
shallow knowledge about implementation of specific design functions in order to translate 
the problem specifications into the target system’s hardware and software. This layer is 
a hybrid expert system (Brewer & Gajski 1990; Bailey et al 1991; Kambhampati et al 
1993), in which several knowledge sources, rule-based as well as procedural, interact and 
contribute towards generating the solution. 
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Figure 1. Overall system architecture. 


However, knowledge is seldom complete. When the S-layer detects that it does not 
have any knowledge about implementing some design subgoal, it invokes the D-layer 
(called MINNIE) by passing the details of the “failed” subgoal. The D-layer contains the 
behavioural models of available devices, organized in the form of a database. This layer 
also contains procedures for retrieving relevant device information from the database, 
and for mapping the behaviour of the failed subgoal to the behaviours of the retrieved 
devices. Thus, the database of the D-layer coupled with the mapping algorithms serve as 
a deep knowledge-based system, which can find solutions from first principles. It may be 
noted that the purpose of the D-layer is essentially to mimic a human designer who, when 
running short of experiential knowledge, consults device handbooks to arrive at a solution. 
On successfully finding a device that can implement the failed subgoal, the D-layer passes 
the specifications of the device’s interface to the S-layer, which then continues with the 
synthesis taks. 

The S-layer and the D-layer together form a two-layer knowledge-based architecture 
for the synthesis of microprocessor-based systems. The integration of the D-layer with the 
S-layer aims at making the design system resilient. 


4. The S-layer 

The S-layer translates the problem specifications into the target system’s hardware and 
software, by using precompiled design knowledge. A schematic overview of the S-layer 
is shown in figure 2, which depicts the sequence of design tasks, the different knowledge 
sources used, the supporting subtasks, and the blackboard-like shared data structure for 
storing partial design results. 




Knowledge-based CAD framework 


723 


Knowledge 

Sources: 


Donain 

Knowledge 


Application 

World 

Knowledge 


Component 

Knowledge 


Programming 

Knowledge 


Supporting 
Sub tasks: 


Constraint 

Propagator 


Conflict 

Detector 


Conflict 

Resolver 


USER- 


Design Tasks: 




Specification ‘ 
Acquisition , 

_\ 

L (£) ! 


Algorithm 

Design 

-! 

_N 

/JE) j 

Architectur 

Design 


_N 

/_ 



Interface 

Design 

r- 

_\ 

L 



Software . 
Design 

- 

_\ 

f 



Circuit _ 
Design 

- -4 

_\ 

f _ 



Simulation 



i Specs. 




CDFG 


Allocations 
S Schedule 


CDFG 

modifications 


Partial Design 
Information 


CDFG 
(in SpeX) 

Module 

Hierarchy 

Allocation 

Tables 


Refinement 

History 


Layout 

Geometry 


Software 


Hardware | 


Figure 2. Architecture of S-layer. 


4.1 The design tasks 

In order to synthesize the target system, several tasks have to be performed, such as the ac¬ 
quisition of the specifications from the user, functional decomposition, hardware-software 
partitioning, interface synthesis, software synthesis, circuit synthesis and simulation. 

These tasks use several categories of knowledge to achieve their objectives. The different 
representations of knowledge have been discussed in a later section. The different cate¬ 
gories of knowledge used for the purpose are the Application World Knowledge, Design 
Refinement Knowledge, and Device Knowledge. 

4.1a Specification acquisition: The specification of the target system is represented in 
SpeX, which is a visual language based on the statechart language (Harel 1987). State- 
charts have constructs for the modular specification of the control flow and concurrency 
of complex systems. In addition to these constructs, SpeX also has features for specifying 
data flows, data constraints, real-time constraints and implementation preferences for the 
functional elements (FEs). 

These specifications are acquired from the user with the help of a graphical user interface, 
and then converted to an equivalent textual form. The specifications are validated by 
checking for syntactic correctness. 
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4.1b Algorithm design: The specification may contain nodes for abstract functional¬ 
ities, which have to be decomposed into a lower level of detail. This decomposition is 
performed by the present task, by a top-down refinement process. Every decomposition 
step introduces new FEs, control flows, data flows and constraints, and also maps the in¬ 
puts and outputs of the parent function to those of the children. The design refinements 
are performed by design refinement rules, which have been encoded in a specialized shell, 
APS, designed for this purpose. 

The selection of the appropriate refinement to be made to the partial design, consists of 
two decision steps. 

(1) Selection of the function or subgoal to be refined, and 

(2) selection of the refinement to be made on the selected function. 

These decisions are made by the following strategy: 

• select the function that has the minimum (nonzero) number of applicable refinements. 

• for the chosen function, select a refinement that satisfies the maximum number of 
constraints. 

This strategy has been formulated with an analogy to the Consistent Labelling Problem 
(CLP) (Haralick & Shapiro 1979), where a variable is selected that has the minimum 
domain set of candidate values and for the chosen variable, a value is selected that satisfies 
the maximum number of constraints. 

The refinement process goes hand in hand with constraint propagation and conflict 
resolution. Constraints are propagated (Steinberg 1987) from one part of the design to 
another in order to (i) influence the proper choice of future refinement steps, and (ii) obtain 
conflict-free designs. Depending on the nature of the constraints, the conflicts are handled 
in different ways: 

• Conflicts in interval constraints, that are imposed on parameters, are resolved by the 
algorithm proposed by Hyvonen (1992). In this method, such constraints are resolved 
by taking the intersection of the conflicting intervals. 

• Mismatches in data type constraints are resolved by introducing a patch-up FE to 
transform one constraint to the other. For example, an analog-to-digital-converter is 
introduced to convert an analog signal (say, current) to its digital equivalent, so that 
necessary computations may be performed by the microprocessor. 

• When communicating processes execute at different rates, buffers are set up to store 

the communication data. » 

• If a storage location is simultaneously accessed by more than one concurrent processes, 
a conflict in the address bus is generated. This type of conflict is resolved by introducing 
an arbitration mechanism for the access, such that a slower process has precedence 
over a faster process. 

• If a conflict on performance constraints is detected, a preliminary hardware-software 
partitioning is done. Conflicts on performance constraints are due to (i) processor 
requirement conflicts, which arise due to conflicting concurrent processes, or (ii) data 
access time conflicts. 
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The resultant partial design is a control and data flow graph (CDFG) where each FE 
has a known implementation, hardware and/or software. These FEs are called primitive 
functions (PFs). 


4.1c Architecture design: A key issue in hardware-software codesign is hardware- 
software partitioning. From among the available implementations for each PF of the CDFG, 
an appropriate implementation has to be selected such that the real-time constraints of the 
problem are satisfied. In order to determine whether such constraints are satisfied, a fea¬ 
sible schedule of the allocated implementations also has to be formed. When considering 
single-microprocessor target systems without multi-tasking, the scheduling should ensure 
a single-thread software. Thus, hardware-software partitioning consists of allocation of 
specific implementations for each PF, and scheduling these implementations. This is per¬ 
formed by the present task. 

The allocation and scheduling steps are formulated as an integrated CLP, where a PF 
is analogous to a variable and an available implementation is analogous to a value. The 
solution to this CLP, subjected to two sets of constraints (the timing constraints to be 
satisfied and the area cost constraints to be optimized), would lead to two sets of partitions, 
one consisting of the hardware implementations and the other consisting of the software 
implementations. But for this application, a few extensions to the conventional CLP is 
required. They are: 

• The set of variables is dynamic, since extra PFs may be added to the partial design to 
resolve conflicts between implementations of interacting PFs. 

• The cost of the target system does not increase with every labelling, since reuse of 
already allocated implementations does not add to the cost of the design. 

• Two levels of backtracking have to be considered, for the allocation and scheduling 
steps respectively. 

A forward checking algorithm is used to find a solution to the CLP, and the user is 
allowed to specify a time bound for the search. The branch-and-bound search technique 
results in a monotonic decrease in the solution cost. Search heuristics are used to quickly 
converge to the optimal cost. 

4.Id Interface design: After the hardware-software partition is formed, the synthesis 
of the interface between the software partition and the hardware partition is the next key 
issue that has to be addressed in a hardware-software codesign framework. The present 
task synthesizes the interface by (i) allocating nonconflicting addresses to the devices to be 
placed on the system bus, (ii) converting event-based transitions of the CDFG into interrupt 
service routines (ISRs), and (iii) synthesizing the device drivers. 

The addresses are allocated by a heuristic algorithm that attempts to reduce the address 
decoding logic. The ISRs are created by a top-down refinement process which replaces the 
event-based transitions by the respective interrupt service actions, thus fragmenting the 
CDFG. In the modified CDFG, the events are captured by the respective interrupt lines, 
and the relevant ISRs handle the transitions. 
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The device drivers are created, again by a refinement process, by synthesizing the inter¬ 
face that allows a device to exhibit the behaviour of the function that has to be implemented 
by it. This interface, consisting of software as well as hardware modules, programs the 
programmable devices and performs data transfers (along with the requisite data trans¬ 
formations) to and from the device. A rule for synthesizing device drivers is described in 
§ 4.2. 

4.1e Software design: At this stage of the design, the CDFG of the partial design 
has several characteristics: (i) it does not contain any partial orders of FEs, (ii) there 
are no event-based transitions, (iii) explicit partitions have been formed for software and 
hardware implementations, and (iv) the implementations of all interfaces between software 
and hardware implementable modules have been determined, and these have also been 
partitioned as software or hardware implementations. From the software partition of the 
CDFG, the target system’s software is generated by macro-substitution of C-program 
templates for each FE, and subsequent compilation to machine code. 

4. If Circuit design: This task synthesizes the address decoding circuitry and estab¬ 
lishes the pin connectivity of the devices allocated during hardware-software partitioning. 

4.1 g Simulation: In order to verify that the synthesized target system behaves correctly, 
the software and the hardware are cosimulated. For this, the software is treated as initial¬ 
ization data for the system ROM, and the circuit is simulated. Event-driven simulation is 
performed, and the behaviours of the devices are represented in SpeX. Faults, if any, are 
traced manually to their cause, and after correcting the error, the design tasks are executed 
again. 

4.2 Design knowledge representation 

The processing and knowledge requirements of each task of the S-layer are summarized 
below. Along with each knowledge item, the type of representation used for that item is 
mentioned. These representation schemes will be explained subsequently. 

(1) Specification acquisition task: 

(a) Acquisition algorithm [Procedure] 

(b) The characteristics of design functions [Database] 

(2) Algorithm design task: 

(a) Refinement steps [Rules] 

(b) Constraint propagation algorithm [Rules] 

(c) World constraints [Rules] 

(d) Function constraints [Rules] 

(e) Conflict detection and resolution strategies [Rules] 

(f) PF-Flags [Facts] 


Knowledge-based CAD framework 


727 


(3) Architecture design task: 

(a) Hardware-software partitioning algorithm [Procedure] 

(b) Knowledge about candidate implementations [Database] 

(4) Interface design task: 

(a) Address constraints of devices [Database] 

(b) Address allocation algorithm [Procedure] 

(c) Refinement steps [Rules] 

(5) Software design task: 

(a) Program templates [Database] 

(b) Macro-substitution and refinements [Procedure] 

(6) Circuit design task: 

(a) Refinement steps [Rules] 

(b) Constraint propagation algorithm [Rules] 

(c) Device constraints [Rules] 

(d) Function constraints [Rules and Facts] 

(e) Conflict detection and resolution strategies [Rules] 

(7) Simulation: 

(a) Simulation algorithm [Procedure] 

(b) Code for function behaviour [Database] 

(8) Task management: 

(a) Task hierarchy and sequence [Rules] 

(b) Failure handling [Rules] 

Mainly two types of representation schemes have been used to encode the above re¬ 
quirements: rule-based and procedure-based. The input for the rules is encoded as facts 
(i.e. the working memory elements of the expert system shell that are not modified by any 
rule), and the input to the procedures are data items, sorted and indexed on key values. The 
latter can be conceptualized as a relational database; hence the use of the term Database 
in the above enumerations. These two representation schemes are described below. 

4.2a Rule-based, representation: Rules are mainly used for: i) partial design refinement, 
ii) constraint propagation and analysis, and iii) task sequencing and failure handling. These 
rules have been implemented in a production system language APS (Mitra 1995). 

The rules for partial design refinement perform functional decomposition of the FEs of 
the partial designs. Hence, the inputs and outputs of these rules are the different features of 
the partial design - the FEs, data flows, control flows, concurrency definitions etc. Besides 
deleting the FE that is being refined (referred to as the parent ) and creating the new FEs 
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(referred to as the children), these rules also map the data and control flows of the parent 
to those of its children. 

A rule for synthesizing the device driver for an Intel 8253 timer is shown in figure 3a. 
This rule refines the partial design to implement an up-counter by this device. Given the 
address allocation of the device and the maximum value of the counter, this rule computes 
the programming commands for the device and the transformation of the output data. Line 
11 of the rule refers to a device constraint - in this case, the address offset for the selected 
timer. The device constraints are represented as facts in the working memory. 

Rules are also used for constraint propagation, conflict detection and conflict resolution. 
Rather than representing the constraints of the application world (e.g. the characteristics 
of ECG signals) and the devices (e.g. I/O constraints) as semantic nets and traversing 
these nets in order to collect and propagate these constraints, these constraints are clubbed 
together with the specific rules that propagate them. The rule for propagating constraints 
of the ECG domain into the partial design is shown in figure 3b. It defines the minimum 
sampling rate of an ECG-signal as 200 Hz and the frequency of the periodic ECG-signal 
as 1.2 Hz. 

The propagation of constraints may result in design conflicts. The detection of a conflict, 
and the strategy for resolving it, are combined into a single rule. 

Task sequencing, as well as inter-layer communication, is also performed by rules. The 
strategy followed here is the same as that used by R1 (McDermott 1982), by using the 
principle of maximum specificity to initiate and terminate tasks. 

4.2b Procedural representation: This kind of processing has been used for (i) spec¬ 
ification acquisition, (ii) hardware-software partitioning, (iii) software generation, and 
(iv) simulation. The procedures have been encoded as C programs, and their input knowl¬ 
edge is accessed from their relevant input files. 


5. The D-layer 

The architecture of the D-layer is shown in figure 4. Input to D-layer is in the form of a 
failure report from S-layer. This report contains information about the design function for 
which S-layer did not have implementation knowledge. The output of D-layer is the list 
of devices and their interfaces which can implement the desired function. 

The behaviours of design functions and available devices are stored in two databases, 
the Function Database (FDB) and the Device Database (DDB). Based on the contents 
of the failure report, D-layer retrieves the behaviour of the function and the behaviours 
of candidate devices that may implement the function. Subsequently, the behaviour of 
the function is compared with the behaviours of each of these candidate devices, and the 
functional specifications of the necessary interface is generated by the behaviour-mapping 
algorithms. Subsequendy, an implementation of the interface is synthesized by using the 
knowledge about the signal constraints and programming modes of the selected device. 

The primary task of D-layer is to compare two behaviours, that of a design function 
and an available device, in order to determine whether the former can be implemented by 
the latter. Achieving this objective, requires that the device behaviours and the function 
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(a) 1. (prog-dev-8253-upcounter-init 

2. ;; initialize counter-by-8253 

3. (context 

4. (task * 0 :fn int-devdrv) 

5. ?s = (fe :fn start :attrs ?s0) 

6. (fe * ?s0 :fn up-counter :attr ?al) 

7. (impl :fn ?s0 :id ?id :dev 8253 :part ?prt) 

8 . ) 

9. ?mc = (getval ?al :maxcnt) 

10. (gt ?mc 255) 

11. (devinfo :dev 8253 :part ?prt :addr ?ct) 

12. (addr-space :id ?id -.mapping ?m :from ?base) 

13. ==> 

14. ?basec = (plus ?base ?ct) ;; counter address 

15. ?ctwd = (plus (mult ?ct 64) 48) 

16. ?cnt-h =(trunc (fldiv ?mc 256)) 

17. ?cnt-l =(minus ?mc (mult ?cnt-h 256)) 

18. ?sl = (make fe :type state :fn write :attrs (setof 

19. (item :addr (plus ?base 3)) 

20. (item :data ?ctwd) (item :map ?m))) 

21. ?s2 = (make fe :type state :fn write :attrs (setof 

22. (item :addr ?basec) 

23. (item :data ?cnt-l) (item :map ?m))) 

24. ?s3 = (make fe :type state :fn write :attrs (setof 

25. (item :addr ?basec) 

26. (item :data ?cnt-h) (item :map ?m))) 

27. (make tr .-source ?sl :destn ?s2) 

28. (make tr -.source ?s2 :destn ?s3) 

29. (make mapc rtype exit :oldst ?s :newst ?s3) 

30. (make mapc :type entry :oldst ?s -.newst ?sl) 

31. (del ?s) 

32. ) 

(b) l. (constr-prop-6 ;; transfer ECG constraints 

2. (context 

3. (task * 0 :fn algo-constr-prop) 

4. ?df = (datafl -.constr ?c) 

5. (eq (getval ?c :type) ECG_signal) 

6. (undef (getval ?c imin.sample^rate)) 

7. ) 

8 . ==> 

9. (mod ?df :constr (join ?c (setof (item :min_sample^rate 200) 

10. (item :freq 1.2)))) ;; 1.2 == 72/60 

11 . ) 

Figure 3. (a) Rule for synthesizing device driver for Intel 8253 to be operated as 
an up-counter, (b) Rule for propagating constraints of HCG-signals. 
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behaviours be represented in a proper form, which facilitates the mapping between the 
two. The formalism of Finite State Machines (FSMs) is commonly used for representing 
behaviours of a large class of devices and circuits. However, this formalism is not suitable 
for the purpose of behavioural mapping because of the following reason. 

Most devices (e.g. microprocessor peripherals) use memory. Representation of the be¬ 
haviour of such a device as a single FSM, requires an exponential number of states. This 
is because the combination of the states of the memory elements is explicitly encoded as 
states of the machine itself. For example, an FSM representation of an n -bit up-counter 
requires 2 n states. In such an encoding scheme, transitions between memory states are 
based on the relevant data input events. 

In the present system, we have adopted a new representation formalism which has been 
used to represent the deep knowledge about the behaviour of the devices and functions. 
Next, we have developed a mapping approach, which is essentially a search algorithm to 
see whether a device can implement a given function. It also derives the specification of 
the interface required to achieve such a match. 

5.1 Representing behavioural knowledge 

One approach for reducing the number of states, is the EFSM modelling scheme (Devadas 
et al 1991). This scheme makes the internal registers explicit, and register update opera¬ 
tions are encoded as actions of the transition arcs of the state transition graphs. However, 
behavioural mapping of two EFSMs requires the performance of reachability analyses of 
the two behaviours, and this can be very expensive for large problems. The statechart lan¬ 
guage (Harel 1987) provides another formalism for reducing such an explosion of states, by 
using compositions of communicating FSMs. In our modelling scheme, termed Composite 
Finite State Machines (CFSMs), we use a similar composition technique for reducing the 
number of states. 

A schematic of the CFSM model is shown in figure 5. A CFSM consists of a number of 
constituent machines (which may be FSMs or combinational units), operating concurrently 
and communicating with each other. One of these constituent machines is termed the 
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Figure 5. CFSM as a set of communicating machines. 


primary machine (PM), which is essentially an FSM and represents the abstract functional 
behaviour of the overall CFSM. The other constituent machines are called subsidiary 
machines (SMs), and handle the data and memory operations only. 

The functioning of an SM is controlled by the PM, or by another SM, via internal 
data and control signals. These internal communications may (i) initiate, stop, suspend or 
resume the functioning of an SM, (ii) access the results of computations performed by the 
SM, or (iii) be control signals generated by an SM. All external outputs are performed by 
the PM. As for the external inputs, the control inputs are accepted by the PM, and the data 
inputs are handled by the relevant SMs. The operational semantics of every constituent 
machine of a CFSM is similar to that of statecharts (Harel 1987). 

Every state transition arc in the PM is labelled by a triplet e[c\/a, where: (i) e is an event 
that activates the transition, (ii) c is a set of guard conditions that enable the actual firing 
of the transition, and (iii) a is a set of actions that are executed by the PM before entering 
the next state. 

The PM of an up-counter is shown in figure 6. In a conventional FSM representation, it 
would have required MAXCNT +1 states. In the PM, /o is the start state, f\ is a nonterminal 
state, and fa and fo are terminal states corresponding to whether an overflow has occurred 
or not before the counter has been stopped. 

This PM, ^.interacts with two SMs, Si andS 2 - Si embodies the function no- of( CLK+), 
i.e. it keeps track of the number of occurrences of the rising edges of the CLK input. S 2 
represents a function for monitoring the output of Si. This function generates a control 
signal when the output of Si exceeds a given bound. T collects data from Si when required, 
and the output of S 2 is used to trigger the overflow transition in T. 

The reduction in the number of states in this PM is due to the presence of the abstract 
nonterminal state f\, which denotes the process of counting. When the PM is in this state. 
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Control Inputs: START, STOP 
Data Inputs: CLK(0:1) 

Control Outputs: OVFL(0:1) 

Data Outputs: DATA(0:MAXCNT-1) 

SMs: SI: no-of(CLK+); S2: SI > MAXCNT 

Edge Labelings: 
arcl: START/init(Sl),init(S2) 
arc2: STOP/DATA=Sl,OVFL=0 
arc3: S2/OVFL=l 


Figure 6. PM of an up-counter. 

the SM Si continues with its task of counting, and the SM S 2 continuously monitors the 
output of Si. 

5.2 Behavioural mapping 

The objective of behavioural mapping is to determine the implementability of a function 
F, by a given device D. A device D implements a function F, if and only if there exists 
an interface 1 such that any input sequence, /, that is accepted by F is also accepted by 
D.I (which is the device with the interface attached to it), and the output sequence of F 
for that input is the same as the output of D.X. 

The device’s interface may transform the inputs of F before sending them to D. Simi¬ 
larly, the outputs of D may also have to be transformed in order to make them equivalent 
to those of F. These transformations are determined by the behaviour-mapping process. A 
library of available transformation operators is maintained, to achieve the required trans¬ 
formation functions. 

It is assumed that the behaviours of F and D are represented in the CFSM formalism, 
as described in the previous section. Although the PM of a CFSM does not use the data 
events directly, the functions that are to be performed on the data are encoded implicitly, 
by name, within the PM. Moreover, all outputs are generated by the PM alone. Thus, 
the PM captures the overall functional behaviour of the CFSM. Hence, for the purpose 
of behavioural mapping, it is sufficient to consider the behaviours as described by the 
respective PMs alone. 

Let the symbol T be used to denote the PM of the desired function F, and the symbol 
V be used to represent the PM of the available device D. Then, T is implementable by T> 
if and only if there exist signal transformation functions, r,- and t a , such that for every path 
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in T from the start state to a final state which accepts the input sequence I, thus generating 
the corresponding output sequence O, there exists a path in V from a start state to a 
terminal state, which accepts the input sequence tj(I), and produces the output sequence 
O' where t 0 (0') = O. Hence, a process of search is required to map T (which is a graph) 
onto V. Along with the mapping of the states of F, the signals of F also get mapped to 
the signals of D, with the requisite transformations. These mappings are generated by a 
variation of the FSM equivalence algorithm of Hopcroft & Ullman (1979). The mapping 
is not necessarily surjective because, in general, a device may be capable of exhibiting a 
wide range of functionalities, only a subset of which is sufficient to meet the requirements 
of F. However, the mapping should be injective. 

Based on the results of the mapping, the implementation of the required interface is 
then determined by using the signal and timing constraints of the device. In addition, if 
the device is programmable, the programming modes are considered as well. 

6. Task management 

The D-layer and the different tasks of the S-layer, mentioned in the preceding sections, 
have to communicate with each other in order to form a coordinated problem-solving 
system. The details of such inter-task communication are described in this section. 

6.1 Communication of design information 

The inter-task communication has been implemented with the help of a shared blackboard¬ 
like data structure. Each design task outputs a partial design on which the relevant con¬ 
straints are defined. This information can be conceptualized as having four fragments, all 
of which are stored in the blackboard. 

(1) The control and data flow graph (CDFG), that depicts the control and data flow among 
the functional elements (FEs) of the design, along with the description of the concur¬ 
rency among the FEs, the constraints imposed on the data flows, and the implementa¬ 
tions that have been allocated to each FE. 

(2) The constraint networks consisting of the timing constraints and the relations among 
the various parameters of the design. 

(3) The geometrical layout of the elements of the CDFG, in order to facilitate viewing of 
the partial design and its interactive editing. 

(4) The design history, i.e. the search tree that has been explored so far. 

These fragments are connected to each other very closely. For example, the leaves of the 
design history tree are the nodes of the CDFG; timing constraints are defined on sections 
of the CDFG; the parameters of the nodes of the CDFG are related by constraints, and the 
layout information is associated with each node and arc of the CDFG. 

Each task takes its input data from this shared blackboard, and on completion of the pro¬ 
cessing stores its outputs in the same place. This partial design information is represented 
as working memory elements of APS. Some of the tasks of the S-layer are implemented 
as procedures, and require a different encoding of the data. These procedures convert the 
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If 

task = Startup 
Then 

Create subgoal for initiating Simulation 
Create subgoal for initiating Circuit-design 
Create subgoal for initiating Software-design 
Create subgoal for initiating Interface-design 
Create subgoal for initiating Architecture-design 
Create subgoal for initiating Algorithm-design 
Delete task 


Figure 7. Rule for starting the processing of S-layer. 

data available in the blackboard into the required format, and on completion of the task 
store the results in the blackboard after performing the reverse conversion. 

6.2 Inter-task control flow 

The S-layer is a hybrid expert system, in which rule-based and procedural tasks interact 
with each other. The sequence among these tasks is managed by a set of task management 
rules. For example, the first rule to be fired by the system, shown in figure 7, creates the 
subgoals corresponding to the different tasks of the topmost hierarchy of the task structure. 
These subgoals are created in the reverse order, so that in the goal stack the last goal created 
becomes the first goal to be initiated. 

Corresponding to each of these task-goals, there exists a rule for creating subgoals for 
the respective subtasks, or for initiating the relevant procedure. For example, the rule for 
initiating the search procedure for hardware-software partitioning, shown in figure 8, calls 
the appropriate procedure, and after its successful termination, deletes the subtask-goal 
from the goal-stack. 

S-layer backtracks to recover from a failure. Failure occurs when a procedure returns a 
failure, or when there does not exist any rule to solve a subgoal. Unlike OPS5 (Brownston 
et al 1986), the feature of backtracking is inbuilt into the shell (APS) that is used for infer- 
encing. The backtracking mechanism of the shell is used to handle the different situations 
of failure. 

(1) Failure to solve a task, in which case the S-layer backtracks to the previous task. 

(2) Failure to solve a subtask, in which case the S-layer backtracks to the previous subtask. 


If 

task = Architecture-design 
subtask = Hardware-software-partitioning 
Then 

Call hardware-software-partitioning-algorithm 
Delete subtask 


Figure 8. Rule for initiating hardware-software partitioning. 
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(3) Failure of D-layer to find candidate implementations for a design function, in which 
case the S-layer backtracks to find an alternative problem decomposition. 

In all these cases of backtracking, the user determines the source of the failure. Sub¬ 
sequently, the S-layer backtracks to that point in the design history, and continues its 
processing from that point onwards, after revising the relevant design decision. 

6.3 Inter-layer communication 

As mentioned earlier, a failure report is created by the S-layer before it invokes the D-layer. 
The constituents of the failure report are: 

(1) Subgoal name, which is the name of the function that could not be solved by S-layer. 

(2) Constraint list, which is a list of constraints on the failed function, that have to be 
satisfied by any device that will solve the subgoal. 

This report is generated by collecting together all control flows and data flows related 
to the failed subgoal. These control and data flows, along with the function’s attributes, 
constitute the constraints imposed on the function. In addition to the above mentioned 
control and data flows, the global constraints of the design are also included in the report. 

The D-layer is invoked from S-layer by the task management rules, after generating the 
failure report. Using the name of the failed subgoal as a key, the device database (DDB) is 
searched for candidate devices that may implement the subgoal. The output of the database 
search is either a single candidate device, or a list of candidate devices, depending on a 
parameter supplied by the user. Behavioural matching is then performed on the candidates 
extracted from the DDB. 

The output of the D-layer is a list of devices and their interfaces, each of which will 
implement the failed subgoal. In addition, the D-layer also returns a value, which indicates 
whether it has succeeded or failed in its task. If the list of devices is empty (i.e. if no device 
is found that can implement the failed subgoal) then the D-layer returns NIL, else it returns 
TRUE. If the returned value is NIL, the S-layer backtracks to find an alternate problem 
decomposition; else it continues with the present design. On the success of the D-layer, a 
flag is created in the S-layer’s working memory to indicate that the failed subgoal is a PF. 

The D-layer is invoked by the S-layer either from the Algorithm Design Task or from the 
Architecture Design Task. The PF information, created on the D-layer’s success, is used by 
the algorithm design task to continue with its processing. For the purpose of the architecture 
design task, the information passed is that for device allocation. In addition to these two 
tasks, the interface design task also requires information about the new implementation, 
in order to integrate it into the design. In cases where the implementation is classified as a 
hardware implementation, the information for the interface design task specifies the address 
constraints, the device initialization process, the device stopping process, and the mapping 
between the dataflows of the function and of the device. In cases where the implementation 
is classified as a software implementation, the information for the interface design task 
specifies the address constraints, the device driving process, and the mapping between the 
dataflows of the function and of the device. 
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7. Results 

The proposed CAD framework has been implemented on an HP-350 workstation, and 
applied to design a variety of microprocessor-based systems: (i) speed controller of a DC 
motor, (ii) different versions of over-current protectors (by polling, by interrupt, multiple- 
protectors), and (iii) ECG monitoring system. The D-layer has been used to find imple¬ 
mentations for the following subgoals of the S-layer: (i) an up-counter by Intel 8253 timer, 
(ii) square wave generator by Intel 8253 timer, (iii) handshake data input protocol by Intel 
8255 port, (iv) multiple interrupt handling by Intel 8259 interrupt controller. 

For the speed controller problem, the user specifies the characteristics of the motor, 
the way it has to be operated (by the phase control method), and the desired speed. The 
partial functional decomposition tree for this problem is shown in figure 9a. One of the 
subgoals of the problem is a Square Wave Generator (used for operating the motor), for 
which no implementation is known to the S-layer. The D-layer is invoked for this subgoal, 
and the state and signal bindings produced by the mapping algorithm, for implementing 
the function by an Intel 8253 timer operating in mode 3, are shown in figure 9b. The 
specifications of the device’s interface are passed to the S-layer, which continues with the 
other design tasks. Finally, the target system’s circuit (shown schematically in figure 9c) 
and the software are generated and cosimulated. 

In this way, hardware-software codesign of microprocessor-based systems is performed 
by the S-layer, and the D-layer imparts resilience to the S-layer. 


8. Conclusion 

In this paper, we have described a new two-layer architecture for computer-aided de¬ 
sign. The incorporation of the behaviour modelling feature in the deep -layer provides 
resilience to the overall system, by allowing the shallow-layer to fall back on such deep 
reasoning whenever the shallow knowledge is found to be incomplete. The necessity of 
deep reasoning has been emphasized by expert system researchers for quite some time. 
This paper convincingly demonstrates the applicability of such reasoning in CAD sys¬ 
tems, and successfully employs the proposed CAD framework to solve real-life industrial 
problems. 


References 

Bailey G D, Raghavan S, Gupta N, Lambird B, Lavine D 1991 InFuse - an integrated expert 
neural network for intelligent sensor fusion. In Proc. IEEE /ACM Int. Conf. on Developing and 
Managing Expert System Programs, pp 196-201 

Barstow D 1985 Domain-specific automatic programming. IEEE Trans. Software Eng. 11: 1321— 
1336 

Brewer F, Gajski D D 1990 Chippe: A system for constraint driven behavioral synthesis. IEEE 
Trans. Comput. Aided Design 9: 681-695 

Brewer F D, Gajski D D 1991 An expert system paradigm for design. In Proc. 23rd DAC (New 
York: IEEE Press) 



738 


Raj S Mitra et al 


Brownston L, Farrell R, Kant E, Martin N 1986 Programming expert systems in OPS5 (Reading, 
MA: Addison-Wesley) 

Chou P, Walkup E A, Borriello G 1994 Scheduling for reactive real-time systems. lEEEMicrc 
14: 37-47 

Devadas S, Keutzer K, Krishnakumar A S 1991 Design verification and reachability analysis 
using algebraic manipulation. In Proc. ICCD-91 (New York: IEEE Press) pp 250-258 
Haralick R M, Shapiro LG 1979 The consistent labelling problem, part 1. IEEE Trans. Pattern 
Anal Machine Intell 1: 173—184 

Harel D 1987 Statecharts: A visual formalism for complex systems. Sci. Comput. Program. 8 
231-274 

Hopcroft J E, Ullman J D 1979 Introduction to automata theory\ languages and computation 
(Reading, MA: Addison-Wesley) 

Hu X, D’Ambrosio J G, Murray B T, Tang D 1994 Codesign of architectures for automotiv 
powertrain modules. IEEE Micro 14: 17-25 

Hyvonen E 1992 Constraint reasoning based on interval arithmetic: the tolerance propagatio 
approach. Artif. Intell 58: 71-112 

Jullig R K 1993 Applying formal software synthesis. IEEE Software 10: 11-22 
Kalavade A, Lee E A 1993 A hardware, software codesign methodology for DSP application: 
IEEE Design Test (Sept.): 16-28 

Kambhampati S, Cutkosky M R, Tenenbaum J M, Lee S H 1993 Integrating general purpos 
planners and specialized reasoners: Case study of a hybrid planning architecture. IEEE Tran 
Syst. Man Cybem. 23: 1503-1518 

Keller R, Baudin C, Iwasaki Y, Nayak P, Tanaka K 1990 Compiling redesign plans and diagnost 
rules from a structure/behaviour device model. Tech. Rep. FIA-90-07-01 
Keravnou E T, Washbrook J 1989 What is a deep expert system? An analysis of the architectur 
requirements of second generation. Knowledge Eng. Rev. 4: 3 
Keuneke A 1991 Device representation: The significance of functional knowledge. IEEE Expe 
(April) 

Kumar S, Aylor J H, Johnson B J, Wulf W A 1993 A framework for hardware software codesig 
IEEE Computer (Dec.): 39-45 

McDermott J 1982 R1:,A rule-based configurer of computer systems. Artif. Intell 19: 39-88 
McFarland M C, Parker A C, Camposano R 1990 The high level synthesis of digital system 
Proc. IEEE1%: 301-318 

Micheli G D 1994 Computer aided hardware software codesign. IEEE Micro 14: 10-16 
Mitchell T M, Steinberg L I, Shulman J S 1985 A knowledge-based approach to design. 1EE 
Trans. Pattern Anal. Machine Intell 7: 502-510 

Mitra R S 1995 Hardware software codesign of microprocessor based systems: A knowledi 
based framework. PhD thesis, Indian Institute of Technology, Kharagpur 
Mitra R S, Guha B, Basu A 1993 Rapid prototyping of microprocessor-based systems. In Pro 
Int. Conf. on Comput. Aided Design (ICCAD-93) (New York: IEEE Press) pp 600-603 
Mitra R S, Kumar M, Basu A 1994 Design of microprocessor-based systems: A knowledge-bas< 
approach. IEEE Trans, lnd. Electron. 41: 352-360 
Mitra R S, Roop P S, Basu A 1996 A new algorithm for implementation of design functions 1 
available devices. IEEE Trans. Very Large Scale Integrated Syst. 4: 170-180 
Smailagic A, Siewiorek D P 1993 The VuMan 2 wearable computer. IEEE Design Test (Sept 
56-67 

Srivastava M B, Brodersen R W 1991 Rapid-prototyping of hardware and software in a unifh 
framework. In Proc. Int. Conf. on CAD (ICCAD-91) (New York: IEEE Press) pp 152-155 


Knowledge-based CAD framework 


739 


Steinberg L11987 Design = top down refinement plus constraint propagation plus what? In Proc. 

IEEE Systems Man and Cybernetics Conf (New York: IEEE Press) 

Tseng C, Siewiorek D P 1986 Automated synthesis of data paths in digital systems. IEEE Trans. 
Comput. Aided Design 5: 379-395 



Sadhana , Vol. 21, Part 6, December 1996, pp. 741-773. © Printed in India. 


Review of hypersonic research investigations in IISc shock 
tunnel (HST1) 

N M REDDY, K NAGASHETTY, G JAGADEESH and K P J REDDY 

Department of Aerospace Engineering, Indian Institute of Science, 

Bangalore 560 012, India 
email: laser@aero.iisc.emet.in 

MS received 11 April 1996; revised 16 August 1996 

Abstract. Real gas effects dominate the hypersonic flow fields encountered 
by modem day hypersonic space vehicles. Measurement of aerodynamic data 
for the design applications of such aerospace vehicles calls for special kinds of 
wind tunnels capable of faithfully simulating real gas effects. A shock tunnel is 
an established facility commonly used along with special instrumentation for 
acquiring the data for this purpose within a short time period. The hypersonic 
shock tunnel (HST1), established at the Indian Institute of Science (IISc) in 
the early 1970s, has been extensively used to measure the aerodynamic data 
of various bodies of interest at hypersonic Mach numbers in the range 4 to 
13. Details of some important measurements made during the period 1975- 
1995 along with the performance capabilities of the HST1 are presented in this 
review. In view of the re-emergence of interest in hypersonics across the globe 
in recent times, the present review highlights the suitability of the hypersonic 
shock tunnel at the IISc for future space application studies in India. 

Keywords. Shock tunnel; shock waves; boundary layers; real gas effects; 
heat transfer; hypersonic aerodynamics. 


1. Introduction 

There has been renewed interest in the field of hypersonics in recent times due to the 
proposed plans for the development of reusable space planes and aero-assisted space 
transfer vehicles. The major activity in this direction includes the American NASP, the 
British HOTOL, the Japanese HOPE and Indian Aerospace plane. This resurgence in 
hypersonics has spurred active research in many aspects of hypersonic flight encountered 
by these vehicles. Research in this area can broadly be categorized into computational 
fluid dynamics (CFD) studies and experimental research in ground-based facilities such as 
hypersonic wind/shock tunnels capable of simulating the flight conditions in the laboratory. 
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Experimental flow field and surface measurements often supply data for the validation of 
the CFD codes and also help in understanding the hypersonic flow phenomena. 

A distinct feature of flight at hypersonic Mach numbers is the occurrence of real gas 
effects due to the passage of air through the bow shock wave in front of the vehicle 
which results in the sudden increase of temperature and pressure. The temperature rise is 
proportional to the square of the speed, for sufficiently high-speed flights. On some parts of 
the vehicle, such as nose or leading edge, the temperature rise may be high enough even to 
dissociate and ionize air molecules. This results in the altering of the flow characteristics 
over the vehicle and constitutes the real gas effects which are very difficult to analyse 
theoretically. Thus, these effects are often estimated experimentally by simulating the 
hypersonic flow over scaled-down models of the prototypes. 

In a blowdown type hypersonic wind tunnel, the required flow Mach number in the test 
section is achieved by decreasing the freestream temperature which results in reducing the 
speed of sound leading to corresponding increase in the Mach number. The upper limit 
on the Mach number is imposed for a given reservoir temperature by condensation of the 
test gas in the test section. Thus conventional blowdown type hypersonic wind tunnels 
are capable of producing high flow Mach numbers but without the accompanying high 
temperatures to simulate real gas effects. This regime is usually referred to as Mach— 
Reynolds-simulation (Homung 1988) in which air may still be considered as a perfect gas. 
However, this does not provide correct simulation above Mach number 6, since the occur¬ 
rence of real gas effects is coupled to the temperature. Real gas effects can be effectively 
simulated by generating air flow in the tunnel with energy matching that in flight of the 
hypersonic vehicle, and can be achieved by expanding the test gas from a reservoir at very 
high temperature and pressure through a nozzle. This is achieved in a shock tunnel by 
using a shock wave to heat and compress the test gas rapidly and expanding the shocked, 
gas through a nozzle to the required Mach number in the test section. 

A hypersonic shock tunnel, HST1, established at the Indian Institute of Science (IISc) 
has been in operation for the past two decades with upgradation of the data acquisition 
system from time to time. The tunnel has been extensively used for measurements such 
as aerodynamic forces and heat transfer rates over bodies of interest at Mach numbers 
varying from 4 to 13. The purpose of the present paper is to review the important contri¬ 
butions made by using the HST1 tunnel along with the description of the instrumentation 
developed specific to measurements in short duration test facilities. A brief description 
of the performance capabilities of the shock tunnel is presented before describing the 
important results obtained. In addition a brief description of the proposed improvements 
for enhancing the performance capabilities of the shock tunnel along with the proposed 
new techniques for the flow visualization at hypersonic Mach numbers in the tunnel are 
presented. 


2. Description of the IISc hypersonic shock tunnel 

The hypersonic shock tunnel (Reddy 1978) consists of a shock tube and the wind tunnel 
sections, as shown in figure la. The corresponding x-t diagram is shown in figure lb 
and describes the principle of operation. The sKock tube is an al uminium tube of 101 mm 
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(b) 


x-t diagram 



Figure 1. Schematic diagram of a typical shock tunnel (a), along with an x-t diagram (b) 


outer diameter and 50.7 mm inner diameter separated into driver and driven sections by 
an aluminium diaphragm of 3 mm thickness. The 2.4 m long driver is equipped with an 
arrangement to feed high pressure driver gas from the high pressure cylinders. The 6.0 m 
long driven section has provision for evacuating the tube, feeding any desired test gas and 
measuring the vacuum level. In addition, two ports 30.5 cms apart are located towards the 
end of the driven section for the purpose of shock speed measurement. The pressure behind 
the primary and the reflected shock waves is monitored by a pressure transducer at the end 
of the driven section. 

The wind tunnel section separated from the shock tube by a thin paper diaphragm 
consists of a nozzle attached to the end of the shock tube which terminates in a 45 cm long 
test section of 30 cm x 30 cm size. The test section is provided with circular optical quality 
glass windows for visual observations. A large tank of 0.72 m 3 volume is attached to the 
end of the test section to collect the test gas in every run and also to swallow all the shock 
and compression waves that are created during the starting process in the nozzle. The 
tunnel section, downstream of the paper diaphragm, is evacuated to a vacuum level of the 
order of 10 -6 mbar (1 mbar = 10 2 Pa) using a rotary pump-diffusion pump combination 
supplied by the Hind High Vacuum Co. Ltd., Bangalore. 

The nozzle at the end of the shock tube expands the test gas at high temperature and 
pressure behind the shock wave to any required Mach number in the test section, if the 
appropriate area ratio is maintained. The tunnel when operated in straight-through mode 
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Figure 2. Variation of the flow Mach number M 2 behind the shock wave with the 
shock strength in the shock tube. The shock Mach number range is varied by using 
air, helium and hydrogen as driver gases. 

uses truncated conical nozzles with an entrance diameter of 50.7 mm and exit diameters 
of 300 mm and 150 mm to yield flow Mach numbers of 6 and 4, respectively. The tunnel 
is operated in the reflected mode by adding a short section of convergent-divergent nozzle 
with throat diameter varying from 25 mm to 7 mm, which yields flow Mach numbers 
varying from 7 to 13 respectively. However, no efforts were made to operate the tunnel in 
a tailored mode to enhance the test time. Further details of the hypersonic shock tunnel 
have been reported earlier (Reddy 1978). 

Thin film heat transfer gauges located 30.5 cm apart at the end of the driven section sense 
the arrival of the shock front due to the sudden jump in the surface temperature of thin films. 
An electronic counter is triggered on by the signal from the first gauge and switched off 
by the signal from the second gauge. Thus the counter reading in microseconds indicates 
the transit time of the shock wave. The pressure jump across the shock wave is monitored 
by a piezoelectric pressure transducer (PCB piezotronics) mounted flush with the inner 
surface of the tube at the end of the driven section. The performance of the shock tube is 
indicated by the calibration curve shown in figure 2. 


3. Measurement of aerodynamic forces over missile shaped bodies 

Since the duration of the uniform flow in the shock tunnel is only a few milliseconds, 
regular force balances used in conventional blowdown type wind tunnels cannot be used 
to obtain aerodynamic data. Hence special techniques, like monitoring of the motion of 
the free-flying model during the run using a high speed photographic technique and an 
interferometric technique with a reflectometer mounted on the model, have been tried out to 
measure the aerodynamic force data (Bernstein and Stott 1981). However these techniques 
have inherent drawbacks of integrity of the model and complexity of model making. 
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Figure 3. (a) Assembly diagram of the three component fast response force balance 

system and (b) a photograph of the complete balance system along with one of the 
test models. 

A novel three-component balance system using fast response accelerometers with a 
response time of fraction of a millisecond has been developed for the first time for use 
in the HST1. A schematic diagram of the balance system is shown in figure 3a and a 
photograph of the balance system along with a typical missile model is shown in figure 3b. 
The details of this balance system along with the basic theory have been reported earlier 
(Reddy 1983; Joshi & Reddy 1986; Channa Raju & Reddy 1990). 

This balance system has been successfully used to measure the aerodynamic force 
coefficients like the lift, drag and moment over different aerodynamic bodies of interest 
flying at Mach numbers 3.85, 5.5 and 9.15 (Shah 1983; Joshi 1985; Joshi & Reddy 1985, 
1986; Channa Raju 1989; Channa Raju & Reddy 1990). Model configurations chosen 
for the investigations are the typical nose cone-cylinder configurations used for missiles 
and re-entry vehicles. The list of models along with the Mach numbers is presented in 





746 


N M Reddy et al 


Table 1. Details of the models used for aerodynamic force measurements in HST1. 


Model* 

Flare angle 

Mach number 

Blunt nose cone-cylinder 

No flares 

3.85; 5.5 

Blunt nose cone-cylinder with flares & fins 

10° 

3.85; 5.5; 9.15 

Blunt nose cone-cylinder with flare & fins 

5° 

3.85; 5.5; 9.15 

Sharp nose cone-cylinder 

No flare 

5.5 


* Blunt nose radius — 0.015m; half angle = 10.5°; length = 0.2266m. 


table 1. The flowfields around these models at hypersonic Mach numbers described by 
full Navier-Stokes equations are extremely complex to investigate analytically. Hence the 
experimental data presented here are of great importance in understanding the aerodynamic 
behaviour of these bodies at high Mach numbers. 

The measured force coefficients based on the body diameters are shown in figures 4 
to 6 for all models at angles of attack varying from 0° to 17°. The experimental results 
are compared with the theoretical force coefficients predicted using modified Newtonian 
theory which takes into account the centrifugal forces over the spherical portion of the 
models (Truitt 1959). The measured data match very well with predicted values for the lift 
coefficient. Other important conclusions drawn from these studies are that for the sharp 
nose cone-cylinder and blunt nose cone-cylinder models the flow separation may be oc¬ 
curring at about 12° angle of attack. Flow separation is delayed by the addition of flare and 
fins. The measured values of drag coefficient at all Mach numbers are found to be higher 
than predicted values. This may be due to the fact that the Newtonian theory does not 
include the contribution of the skin friction to the total drag. The lift coefficient decreases 
with the increase of the flow Mach number for all the models, whereas the dependence of 
the drag coefficient on the Mach number is negligible. The measured values of moment 
coefficients for all the models at all three test Mach numbers match predicted values very 
well. 


4. Heat transfer rate measurements 

A small fraction of the kinetic energy of a flight vehicle is converted into heat energy and 
transferred to the body from the fluid medium surrounding the vehicle by convection. At 
hypersonic speeds, this small fraction can result in an enormous amount of heat energy and 
hence aerodynamic heating becomes a major design concern. For designing the thermal 
protection system of the flight vehicle, one needs to know the convective heat transfer rates 
to the body as accurately as possible. Heat transfer data that can be properly correlated 
to the flight are measured in the shock tunnel by simulating the Mach number, Reynolds 
number and the total temperature of the flight. 

The HST1 tunnel has been used extensively to measure the heat transfer data on different 
bodies of interest at various Mach numbers and over a spectrum of Reynolds numbers. 
Details of these measurements along with the model details and important results are 
summarized in the following sections. 
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Figure 4. Variation of the experimental and theoretical values of the lift coefficient 
with angle of attack at different flow Mach numbers, (a) A, — blunt nose cone 
cylinder with 10° flare-fins; □, — blunt nose cone cylinder with 5° flare-fins; 0, - 
• - blunt nose cone cylinder, (b) 0, — sharp nose cone cylinder; — blunt nose 

cone cylinder; A- • - blunt nose cone cylinder with 5° flare-fins; o,-blunt 

nose cone cylinder with 10° flare-fins. 
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Angle of attack a 



Figure 5. Variation of the experimental and theoretical values of the drag coefficient 
with angle of attack for different flow Mach numbers; (a) and (b) as in figure 4. 


4.1 Heat transfer over aflat plate 

A flat plate model of 22.9 cm length and 20 cm width, with a sharp leading edge as 
shown in figure 7, and made of an aluminium alloy has been used for heat transfer data 
measurements. The nose angle of 35° is chosen so as to ensure an attached shock wave at 
the bottom of the plate. The thickness of the model is fixed such that it can accommodate the 
heat transfer gauges mounted flush with the flat plate surface. These gauges are mounted 
along the centreline at locations indicated in the figure. 

Platinum thin film gauges on the surface of the flat plate are used as fast response 
thermometers to sense the surface temperature history during the hypersonic flow over the 
model. The thin film of platinum is obtained by firing platinum paint on an insulating surface 
such as a Pyrex rod of 9 mm length and 10 mm diameter (Baskaran 1977; Remesh 1996). 
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Figure 6. Variation of the experimental and theoretical values of the pitching mo¬ 
ment coefficient with the angle of attack for different flow Mach numbers: (a) and 
(b) as in figure 4. 
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Table 2. List of components used in the circuit diagram given in figure 8. R=resistc 
P=potentiometer; D=diode; T=transistor; C=capacitor. 


Component 

Specifications 

Component 

Specifications 

R1 

4.7 kG, 1/4 W 

pi 

1 kG, 10 turn 

R2 

100Q, 1/2 W 

P2 

100 kG, 10 tu 

R3 

10 kG, 1/4 W 

PI 

250 G, 10 tun 

R5 

3.3 kG, 1/4 W 

D1 

5.6 Vz 

R6 

120 kG, 1/4 W 

D2 

IN 4002 

R7 

1.8 kG, 1/4 W 

T1 

SL 100 

RIO to R19 

10 kG, 1/4 W,± 1% 

T2 

BC 109 

R20 to R39 

10 kG, 1/4 W, ±5% 

IC1 

LM 318 

R40 

5kG, 1/4 W,± 1% 

IC2 

jx A 741 

R41 

2.7 kG, 1/4 W 

IC3 

LM 324 

R42 

100 a 1/4 W 

IC4 

SN 74121 

R43 & R44 

lkG, 1/4 W 

Cl 

100 p.F, 25 V 

R45 

100 kG, 1/4 W 

C2 

0.1 fxF, 30V 

R46 

lOkG, 1/4 W 

C3 

1 |xF, 20V 

R47 

100kG, 1/4W 

C4 

0.1 |xF, 30V 

R48 to R50 

lkG, 1/4 W 

C5 

0.1 |xF, 30V 

R51 

100kG, 1/4W 

Cl 

0.1 |xF, 25V 

R52 

10kG, 1/4W 

C8 

0.1 (xF, 30V 

R53 

100 kG, 1/4 W 

C9, C42, C44 


R54 

330G, 1/4 W 

C47, C49 

100 |xF, 25V 

R55 

220G, 1/4 W 



R56 

10kG, 1/4W 

CIO 


Cll toC40 

2200 pF, ± 5%, 60 V 

C41, C43, C45 

0.1 (xF, 30V 

C51 

0.01 (xF, 30V 

C46, C48 & C50 



The film is energised with a constant 20 mA current. The rise in the surface temperati 
during the flow causes a change in the resistance of the film which induces a correspond] 
voltage jump across the gauge. Thus the variation of the film resistance with time represe 
the time history of the surface temperature during the flow. This surface temperature hist< 
can be directly converted to heat transfer rate by an analogue electrical RC network, bi 
using the analogy between the heat conduction into the body and current conduction. 1 
circuit consisting of a large number of RC combinations having a time constant of 
microseconds is shown in figure 8 and the values of the components used in this circ 
are given in table 2. The complete instrumental set-up for heat transfer rate measureme 
in the shock tunnel is shown in figure 9. Data from the heat transfer gauges are acqui: 
and processed using a multichannel transient recorder (Data Lab, UK) with a sampl: 
rate of 2 million samples per second. Details of the data acquisition system coupled t 
computer for processing the data have been reported earlier (Reddy & Reddy 1988). 
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Figure 7. Schematic diagram of a flat plate model. 


Results of the experimentally measured heat transfer data in HST1 for the flat plate are 
presented in figures 10 to 14. All the data are measured at zero angle of attack for flow Mach 
number 5.75, with Reynolds number varying in the range 2 x 10 4 to 2.5 x 10 5 . The higher 
values of Reynolds number were obtained by using helium as the driver gas while the lower 
values were obtained using nitrogen as the driver gas in the shock tunnel. Heat transfer data 
in terms of the Stanton number are presented in figure 10 as a function of local Reynolds 
number along the plate surface. These data have been used to estimate the skin friction 
coefficient using Reynold’s analogy between skin friction and heat transfer coefficient 
(Anderson 1989; Remesh 1996). The variation of skin friction along the flat plate surface 
is shown in figure 11 as a function of local Reynolds number. The Stanton number and the 
skin friction coefficients at different locations along the centre line of the plate are shown in 
figures 12 and 13. Viscous effects within the boundary layer lead to increase in temperature 
and reduction in gas density which ultimately enhance the thickness of the boundary-layer 
downstream of the leading edge of the flat plate. This growing viscous boundary-layer 
thickness in turn induces viscous interaction which affects the surface pressure distribution, 
lift, drag and stability of hypersonic vehicles. The similarity parameter governing the 
viscous interaction, x = M^VC/VRe, where C = PwP'w/PeP-e, is the freestream 

Mach number, Re is the Reynolds number, p is the density and p is the viscosity coefficient, 
can be used to ascertain the viscous interaction effects. In the present study the value of the 
viscous parameter was in the range 0.3 to 1.5, indicating that the viscous interaction is very 
weak. The dependence of the heat transfer coefficient on the viscous interaction parameter 
is shown in figure 14. The theoretical data computed using full Navier-Stokes equations 
in the conservative form are also plotted in figures 10 to 14, along with the experimental 
data. These equations are solved using a 2-D laminar flow code developed recently based 
on the finite volume method (Sreekanth 1993). 

The above results show that the measured heat transfer data match the theoretical values 
very well. Since the local Reynolds number is well below the critical Reynolds number for 
the transition from laminar to turbulent boundary layer, the flow over the entire flat plate 
is laminar. Therefore, the heat transfer coefficient decreases along the length of the plate. 
Since the interaction parameter is found to be very low, the viscous interaction effect is 
very weak and its effect on the heat transfer coefficient along the length of the plate is 
negligible. 



Figure 8. Circuit diagram of the analogue network used for conversion of the temperature signal to a heat transfer signal. 
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Figure 9. Schematic diagram of the completely instrumented shock tunnel HST1. 
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Figure 10. Variation of the heat transfer coefficient with Reynolds number over the 
surface of a flat plate. 



Reynolds No Re 


Figure 11. Variation of the skin friction coefficient with Reynolds number over the 
surface of a flat plate. 
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Figure 12. Variation of the heat transfer coefficient at different locations on the 
surface of a flat plate. 



Figure 13. Variation of the skin friction coefficient at different locations on the 
surface of a flat plate. 
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Figure 14. Variation of the heat transfer rate coefficient as a function of the viscous 
interaction parameter. 

4.2 Heat transfer rate measurements over an SLV-3 model 

The most important data generated in the HST1 tunnel soon after it was commissioned 
in 1975 were the heat transfer rates on an ISRO satellite launch vehicle SLV-3 model, 
shown schematically in figure 15, which experiences maximum heating at an altitude of 
approximately 14.7 km (Reddy & Viswanath 1977). The corresponding flight conditions 
are Mach number 3.66 and velocity 1.08 km/s, Reynolds number 15.5 x 10 6 m -1 and total 
temperature of 800 K. Hence heat transfer rates were measured in the HST1 tunnel atMach 
numbers 4.0 and 5.5, with Reynolds number varying in the range 2 x 10 5 to 3 x 10 6 m -1 . 


approximate location 
of film gauges 



Figure 15. Schematic diagram of the SLV-3 model. 
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Figure 16. Distribution of the heat transfer rate coefficients on the nose cone. 


The results of the measurement are presented in figure 16 in the form of Stanton number 
Stco (based on the freestream values of the flow field) along the conical surface of the 
SLV-3 nose cone. The Reynolds number in these measurements is low as air was used as 
the driver gas. The effect of a trip positioned at s/R = 1.4 over the model has been found to 
show negligible effect at these low Reynolds number. The results are plotted in figures 17 
and 18 in the form of local Stanton number (St) vs the Reynolds number (u e $/v e ) for 
Mach numbers 4 and 5.5 respectively. The higher Reynolds numbers were obtained using 
a combination of hydrogen driver gas and high driver pressures. Corresponding theoretical 
heat transfer rates predicted using flat plate theory with appropriate transformations for 
cone flow are also shown in these figures. For the laminar case, two estimates employing 
a certain reference temperature formula, as described by Korkegi (1962), are used. For 
the turbulent case, two estimates have been made: (i) by the method of Korkegi (1962) 
which again uses a certain reference temperature and (ii) by the Van Driest II formulation 
(Hopkins 1972). Theoretical estimates of Stanton numbers are based on the (local) cone 
Mach numbers of 1.8 and 1.9 at Mach 4 and 5.5 respectively, and the wall temperature 
ratio T w / T r = 0.4 at both values of Mach numbers. 
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Figure 18. Heat transfer correlation with local Reynolds number at Mach 5.5. 





Hypersonic research in shock tunnel 

(a) CD 



—i. I- 


I HEAT 
TRANSFER 
( 1 : 64) 

II PRESSURE 
(I ! 75) 

UI FORCE 8 
OIL FLOW 
l I .80) 


D! 

(m m) 

Dl 

(deg) 

L_i_ 

Di 

(deg) 

50 0 

02188 

20 

1.4062 

15 

42.66 

02188 

20 

1.4062 

15 

40.0 

0.2188 

20 

1.4062 

15 


Station 

1 

2 

3 

X/R n 

4.357 

10. 7 86 

II 845 

s/R n 

5 157 

11.58 4 

12 .693 

X/D2 

1.089 

2 697 

2.961 


IgBi 


ALUMINIUM 

©,©,© ARE MACOR INSERTS 
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inserts for the heat transfer gauges (b). 
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Figure 20. (a) Photograph of the bulbous heat shield model with the gauges, 
(b) Electrical leads from the gauges are shown. 


Results presented in figures 16 to 18 show that, at relatively low Reynolds number, the 
measured heat transfer rates are in very good agreement with the theory for laminar flow 
at both freestream Mach numbers. At Mach 4.0, natural transition to turbulent flow occurs 
around a local Reynolds number of 2 x 10 5 . The measured data after some overshoot 
in the transition region appear to settle down to values predicted by Van Driest II at 
higher Reynolds numbers. At Mach 5.5, on the other hand, there is no sign of transition to 
turbulence in the range of Reynolds numbers covered in the experiments. 

Thus, in conclusion, the above investigations have shown that at Mach 4 the Van Driest 
II flat plate theory for turbulent boundary layer (Hopkins 1972), with appropriate mod¬ 
ifications for the cone flow, may be used to get reasonable estimates of heat flux on the 
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Figure 21. Windward side heat transfer rate distribution at 0° angle of attack using 
nitrogen driver gas 

conical surface at high Reynolds numbers. Also, at Mach 5.5 no transition to turbulence 
flow occurs at the low Reynolds numbers considered in the above investigations. 

4.3 Heat transfer measurements over a bulbous heat shield model of a launch vehicle 
(PSLV) 

Satellite launch vehicles usually have bulbous heat shield configurations in order to be 
able to launch satellites of different shapes and sizes. Estimation of hypersonic flow fields 
over such geometries at hypersonic Mach numbers is extremely complex, especially at an 
angle of attack . Theoretical analysis of such flow fields would be a Herculean task de¬ 
manding extremely powerful computer systems with very advanced numerical alg^* 11 ™ 5 
An alternative is to measure the aerodynamic data experimentally using hypersor 
tunnels. 
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Figure 22. Windward side heat transfer rate distribution at 10° angle of attack using 
nitrogen driver gas. 


Detailed heat transfer measurements have been made on the bulbous nose cone model 
in the HST1 tunnel at freestream Mach number 6, total temperature 1830 K and Reynolds 
number (based on the nose radius) up to 2.5 x 10 4 (Reddy et al 1988; Reddy & Srinivasa 
1990; Srinivasa 1991). Here, the lower values of the Reynolds number were obtained 
using nitrogen as the driver gas, while the higher values were obtained with hydrogen as 
the driver gas. 

The blunt cone-bulbous heat shield model used for the heat transfer measurements is 
shown in figure 19a and is a scale model of the PSLV nose cone. The 195 mm long and 
50 mm diameter model is fabricated using aluminium alloy with cut-outs for insertion of 
the heat transfer gauges. The spherical nose cap is made of the insulating backing material 
so that one of the thin film heat transfer gauges of about 5 mm length can be formed at 
the stagnation region. For these studies a machinable glass material called Macor is used, 
for the first time in India, as the insulating backing material. This special glass material 
has nearly the same properties as Pyrex glass but has the additional attractive property 
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Figure 23. Windward side heat transfer rate distribution at 17° angle of attack using 
nitrogen driver gas. 

of machinability. Therefore, unlike Pyrex glass, any intricate model shapes can be easily 
formed out of this material and thus it can be used as backing material for heat transfer 
gauges. Thus the geometric similarity of heat transfer models can be achieved easily. The 
gauges are found to be more durable than gauges made using Pyrex glass as backing 
material. Macor gauge inserts of 10 mm width along the body of the model are made in 
two pieces as shown in figure 19b, so that in case of damage to some of the gauges, that 
piece alone needs to be replaced. Photographs of an instrumented model are shown in 
figure 20. The instrumented model can be mounted in the test section at angles of attack 
in the range —18° to -hi8° and at any roll angle, so as to measure the heat transfer rate 
distribution over the entire model. 

A series of tests have been conducted initially using nitrogen as driver gas in order 
to measure heat transfer rates at low total temperature, which ensures perfect gas-like 
behaviour of the test gas. In these tests, typical static pressures and temperatures are 
400 N/m 2 and 850 K respectively and the freestream Reynolds number based on the nose 
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Figure 24. Heat transfer rate distribution at 0° angle of attack using hydrogen driver gas. 


radius is about 1 x 10 4 . These low temperature tests were undertaken essentially to ascertain 
the functioning of the analogue network used for converting the temperature signal from 
the heat transfer gauges to heat transfer signals. However, typical satellite launch vehicle 
ascent flight conditions at Mach number 5.5 are Re n of 2.8 x 10 4 and total temperature of 
1700 K. Hence a second series of tests were conducted at freestream conditions close to 
these values using hydrogen as driver gas, which yields a freestream test Mach number of 
5.75 and Reynolds number of 2.5 x 10 4 based on nose radius and a stagnation temperature 
of 1830 K. 

The heat transfer data measured along the windward generator (0 = 0°) at angles of 
attack 0°, 10° and 17° with nitrogen as driver gas are shown in figures 21 to 23 respectively. 
Heat transfer rate data are plotted as Stanton number along the surface of the model 
normalised with respect to the stagnation point Stanton number estimated using the Fay 
& Riddel (1958) method. Corresponding data using hydrogen as driver gas are presented 
in figures 24 to 26. Heat transfer data along the leeward side (<£ = 180°) measured for 
10° and 17° angles of attack with hydrogen driver gas are shown in figures 27 and 28, 
respectively. 
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Figure 25. Windward side heat transfer rate distribution at 10° angle of attack using 
hydrogen driver gas. 


The scatter in the measured heat transfer rates for 0° angle of attack (figure 21) is due to 
the high noise level in the analogue network for heating rates below 1 W/cm 2 obtained for 
nitrogen driver gas. Figures 22 and 23 show that at higher angles of attack the heat transfer 
rates are high over the cone on the windward side and that scatter is reduced considerably. 
The boat tail-cylinder compression comer effect enhancing the heating rate is also seen 
in these figures. A possible interaction of boundary layer with compression waves arising 
from the interaction of the Prandtl-Meyer expansion waves with the bow shock wave is 
indicated by the lower heating rates over the first cylinder, after the cone-cylinder junction, 
as compared with further downstream values. In case of hydrogen driver gas the windward 
data at 0° angle of attack of figure 24 shows that heating rates at the cone-cylinder and 
cylinder-boat tail junction reduce as expected due to Prandtl-Meyer expansions. Also 
the results indicate that, over the first cylinder, the boundary layer thickness increases 
immediately downstream of the cone-cylinder junction and decreases subsequently, which 
is inferred from the decrease and increase of the heat transfer rate coefficient. The windward 
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Figure 26. Windward side heat transfer rate distribution at 17° angle of attack using 
hydrogen driver gas. 


side heating rates at 10° angle of attack shown in figure 25 are higher than the corresponding 
heating rates for 0° angle of attack, but show a similar trend. On the first cylinder, the 
interaction of reflected compression waves with the boundary layer also occurs at this 
angle. These results indicate the existence of a small separation bubble at the boat tail- 
cylinder junction. Also, at this angle of attack the compression comer appears to have a 
fairly strong shock giving rise to appreciable jump in heat transfer rate. At 17° angle of 
attack the windward side heating rates shown in figure 26 are significantly high, especially 
over the cone (20° half-angle) where they are close to stagnation point heating rates. The 
drop in heating rates due to Prandtl-Meyer expansion at the cone-cylinder junction appears 
to have begun over the cone itself and the expansion is no longer centred but has spread 
out over the other side of the junction. Here also high heating rates over the first cylinder 
and interaction of compression waves with boundary layer are seen. The flow is found to 
be fully attached on the windward side. 
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Figure 27. Leeward side heat transfer rate distribution at 10° angle of attack using 
hydrogen driver gas. 


Leeward side heating rates for both 10° and 17° angle of attack (figures 27 and 28) 
are lower than the corresponding values for the windward side. For 10° angle of attack, 
the heating rates are very low - between s/R n = 7.5 and 10 - indicating separation 
of flow. Higher heating rates are observed at the end of the cylinder, probably due to 
reattachment of the flow. This reattachment enhances the heating rates over the second 
cylinder appreciably. At 17° angle of attack the crossflow separation appears to have 
moved slightly upstream as compared to that at 10° angle of attack, as shown in figure 28. 
However, towards the aft end of the first cylinder, heating rates are higher than those for 
10° angle of attack, which is due to significant reattachment flow at this higher angle of 
attack. 

Experimental heat transfer rates are compared with the data computed for zero angle 
of attack in figure 29. The computations are made using 2-D boundary-layer code (Tobak 
& Peake 1979) and viscous shock-layer code (Swaminathan 1983). The predicted heating 
rates match very well with the experimental rates in general, the viscous shock-layer results 
being closer to the measured values than the 2-D boundary layer results. 
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Figure 28. Leeward side heat transfer rate distribution at 17° angle of attack using 
hydrogen driver gas. 

5. Flow visualization using electrical discharge technique 

Visualization of the complex flowfields around the hypersonic vehicles is an impor¬ 
tant aspect of the experimental reentry aerodynamics. This is conventionally achieved 
by employing optical systems, such as the schlieren method, interferometry, shadow¬ 
graphs and holography. Except for holography none of the other systems is useful for 
three-dimensional shock wave visualization because the visualization in these systems 
is achieved by passing light along the optical axes perpendicular to the direction of the 
gradient of gas density. On the other hand, the holographic technique is very sensitive 
to mechanical vibrations and hence is very difficult to use for flow visualization in. wind 
tunnel tests. In addition, since the run time and freestream densities of the test gas in the 
test section of the hypersonic shock tunnel are very low, flow visualization using the above 
techniques is extremely difficult. 

We have recently developed a new electrical discharge technique to visualize the hy¬ 
personic flowfields around the test models in an hypersonic shock tunnel (Jagadeesh et 
al 1996). The basic principle of this technique is that spontaneous light emission from 
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Figure 29. Comparison of experimental heat transfer rates with the predicted values 
at 0° angle of attack. 

a discharge zone is dependent on local gas density. Hence, when an electrical discharge 
takes place across a shock wave, the position of the shock wave cah be clearly seen, as 
the light intensity from the shock wave is different from that of the freestream or the 
shock layer. The flowfield around a flat plate with a sharp leading edge and at an angle of 
attack visualized using this technique in the HST1 shock tunnel at Mach 5.75 is shown 
in figure 30. This technique can be extended for three-dimensional flow visualization by 
appropriate arrangement of the electrodes and by taking the photograph of the electrical 
discharge either in the downstream or upstream direction of the flow. 

6. Enhancement of performance capabilities of HST1 

As is evident from the above results, one of the serious limitations of the HST1 in its 
current form is the limitation in the Reynolds number simulation. The Mach number 
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Figure 30. The flow field around a flat plate with a sharp leading edge at 6° angle 
of attack visualized using the electrical discharge technique. The flow conditions 
are: Mach number = 5.75, freestream temperature = 173 K and freestream density 
= 1.0 x 10 -2 kg/m 3 . 

limitation is overcome by the addition of the wind tunnel portion to the shock tube. How¬ 
ever, enhancement of the Mach number beyond a specific value would reduce the density in 
the test section drastically. One way to improve the performance capabilities is to increase 
the temperature and the pressure of the test gas in the stagnation chamber before the nozzle 
entrance. This can be achieved by increasing the strength of the shock wave in the shock 
tube by using higher pressure difference across the primary diaphragm. Currently, since 
the shock tube is made of aluminium, the pressure in the driver section cannot be increased 
to higher values. Hence, to overcome this limitation, we have planned to strengthen the 
shock tube by adding a stainless steel jacket to the entire length of the tube. This would 
ensure enhancement of shock tube performance with minimum fabrication work. 

Real gas effects are more effectively simulated in another version of the shock tunnel 
known as free-piston driven shock tunnel (FPST), which is also referred to as the hyper¬ 
velocity shock tunnel, since in this tunnel the velocity of the flow is simulated instead 
of the Mach number (Reddy et al 1993, 1994). The basic design of an FPST is shown 
schematically in figure 31. In this tunnel helium driver gas is preheated to about 5000 K by 



Figure 31. Schematic diagram of the proposed IISc free piston driven hypersonic 
shock tunnel. 
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Table 3. Specifications of the proposed IISc free piston-driven hypersonic 
shock tunnel. 


Parameter 

Value 

Diameter of the compression tube 

0.165 m 

Diameter of the shock tube 

0.036 m 

Length of the compression tube 

10.0 m 

Length of the shock tube 

4.5 m 

Volumetric compression ratio 

60.0 

Mass of the piston 

20.3 kg 

Initial pressure of air in the reservoir 

66.5 atm* 

Typical shock Mach number 

18 

Pressure at the diaphragm rupture 

948 atm* 

Temperature at the diaphragm rupture 

4693 K 

Typical estimated stagnation enthalpy 

45 MJ/kg 

Typical estimated test section flow velocity 

9.5 km/s 


* 1 atm = 10 5 Pa. 


an adiabatic compression in the compression tube which produces shock Mach numbers, 
M 5 — 15. Specific stagnation enthalpies exceeding lOMJ/kg are commonly achieved in 
these tunnels. Installation of an FPST with the specifications given in table 3 at the Indian 
Institute of Science is under progress. 

In all the measurements reported in this review the wind tunnel portion of the HST1 
was made out of mild steel. An important requirement of the flow simulations in the 
wind tunnels is to maintain very clean test gas. To ensure this condition in the tunnel the 
inner portion of the wind tunnel section was coated with a special paint to eliminate rust 
problems. However, to completely eliminate the rust problem we have recently added a 
new wind tunnel section made of stainless steel along with a new pumping system capable 
of achieving vacuum levels of 10 -6 mbar in a short time to enhance the turn around 
time. 


7. Conclusions 

The importance of a hypersonic shock tunnel for laboratory simulation of the flowfields 
around hypersonic space vehicles is highlighted. The capabilities and the performance 
characteristics of the hypersonic shock tunnel HST1 established at the Indian Institute of 
Science, Bangalore are described. Important research work undertaken in this tunnel over 
the last two decades is reviewed. Major aerodynamic data generated in this facility include 
measurement of aerodynamic force coefficients over missile shaped bodies, heat transfer 
rate measurements for space vehicle models such as SLV-3 and PSLV at various Mach 
numbers, and also generation of heat transfer data over flat plates to be used for validation 
of CFD codes. 
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Self-co-operative ternary pulse-compression sequences 
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Abstract. An algorithm called a Hamming scan was developed recently for 
obtaining sequences with large merit factors and is adopted here to obtain such 
sequences within which there are nontrivial segments of large merit factors. 
Correlative detection of the return signal can be based simultaneously on the 
entire sequence and its segments with large merit factors. Such a coincidence 
detection scheme can be characterized by a Schur merit factor of the sequence. 
Sequences with large Schur merit factors are listed. 

Keywords. Self-co-operative sequence; auto-correlation; Hamming scan; 
Schur merit factor. 

1. Introduction 

The problem of signal design in radar consists of obtaining sequences with prescribed 
finite alphabet and peaky autocorrelation. The peakiness of the autocorrelation can be 
characterized by a merit factor (Golay 1977). Three finite alphabets considered in the 
literature are binary (+1, —1), ternary (0, —1, +1) andquinquenary (0, +1, —1, +2, —2). 
The largest merit factors obtained so far with these alphabets are 14.0833 (Golay 1977), 
20.0556 (Moharir et al 1985; Singh et al 1996; Moharir et al 1996) and 162.000 (Moharir 
& Rao 1996). These are achieved for the lengths 13,23 and 7 respectively, which are rather 
small. As the length increases, it becomes more difficult, on an average, to obtain very 
high merit factors. However, it is established (Moharir et al 1996; Moharir & Rao 1996) 
that the constraint of binary alphabet would have to be overcome if superior merit factors 
are desired. But this alone may not suffice. Therefore, the notion of complementary or co¬ 
operative sequences (Golay 1961; Boehmer 1967; Tseng & Liu 1972; Venkata Rao et al 
1986) has been introduced. These are sets of sequences of which the co-operative merit 
factor is very high. It can even be infinity. The difficulty is that two or more sequences 
have to be transmitted and they may fare differently over fading channels. The question, 
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therefore, is whether the notion of co-operation can be used, while still having to transmit 
a single sequence. 

The answer to this question is in the affirmative. A notion of towers (Moharir et al 
1984) was introduced earlier. A tower was defined to be a sequence with good aperiodic 
autocorrelation, embedded in which there are marked segments of nontrivial lengths but 
good aperiodic autocorrelation. This sequence is transmitted and the return signal is cross- 
correlated separately with this sequence and its marked segments, maintaining proper 
temporal relation among them. All the cross-correlations would simultaneously peak at the 
delay equal to the two-way travel time. Thus, the distance to the target can be estimated oh 
the basis of coincidence of the peaks in various cross-correlations. This notion is extended 
here in two ways. First, no algorithm was available to list good towers or self-co-operative 
sequences. Now, an algorithm developed for signal design is adopted for this purpose. 
Second, a quantitative measure to characterize self-co-operative sequences is proposed. 

2. Earlier signal design algorithms 

Signal design problem can be viewed as an optimization problem (Bemasconi 1987; Golay 
& Harris 1990; De Groot et al 1992). One of the effective optimization algorithms for 
such combinatorial problems is a genetic algorithm (Holland 1975, 1992; De Jong 1985; 
Michalewicz 1992). But it gives undue importance to chance. An algorithm called eugenic 
algorithm was developed recently (Singh et al 1996) and used to list ternary sequences with 
high merit factors. It supplements chance by the notion of a locally complete search. One 
component of this algorithm is a Hamming scan; Mutation in genetic algorithm is changing 
one element in the sequence. That is, the result is a first order Hamming neighbour. The 
Hamming scan looks at all the first order Hamming neighbours and picks up the one with 
the largest merit factor. The process is recursively repeated as long as the improvement 
in merit factor continues. Modification of such a recursive Hamming scan is adopted as a 
procedure in the next section. The next step (Moharir et al 1996) was to use the Kronecker 
product of two sequences as a starting point for the recursive Hamming scan. This algorithm 
was called the SIKH algorithm and led to very good long ternary sequences. They are used 
in the next section for the present purpose. 


3. Design algorithm for self co-operative sequences 
Let 


So = (so.^l. • • - 2, Sff- 1) 

be a sequence of length N. Let 

s = ( s ji s j+ It • ■ • > s j+n—2i Sj+n— 1 ) 

be a marked segment of length n in it, with j prescribed. 

The algorithm takes a sequence So of a large length N obtained by any good algori thm 
such as the SIKH algorithm (Moharir et al 1996; Moharir & Rao 1996). This sequence has 
a good merit factor. Then it embeds the segment s in it at the prescribed place determined 
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by j. That is, segment s overwrites the elements, from j to j + n — 1 in So. The segment 
s could be a sequence obtained by any good algorithm. This step alters the sequence So 
to S a and brings down its merit factor. Next the sequence S a is improved by successive 
applications of the Hamming scan without touching the elements from j to j + n — 1. 
This restricted Hamming scan mutates the elements from 0 to j — 1 and j + n to N — 1, 
one at a time and finds out which among these Hamming neighbours has the largest merit 
factor. If it is greater than the starting merit factor, the algorithm shifts to that Hamming 
neighbour. The procedure is continued till the merit factor stops improving. The resultant 
sequence is S. Thus, the merit factor of the marked segment s is good because the segment 
is directly embedded, and the merit factor of the entire sequence S is good because the 
recursive restricted Hamming scan has been used for improvement. 


4. Results 

The results are explained first with an example with moderate lengths. The marked segment 
s is a ternary sequence of length 23 and has a merit factor of 20.0556. The length of the 
sequence So in which it is embedded centrally is 33. Because s is embedded centrally, it can 
be called the core. Before this embedding, the merit factor of the sequence So is 19.5313. 
After embedding it falls down but is improved to 5.25 by recursive restricted Hamming 
scan. There is a precaution to be taken in using this algorithm. The restricted Hamming 
scan can improve the merit factor by converting most of the elements outside the core to 
zero, that is, at the cost of the energy efficiency. Therefore, the algorithm should be stopped 
by ensuring that the energy efficiency does not become too low. We have set up a energy 
efficiency threshold of 0.6000. 

The merit factor of the core s is no more a measure of its goodness, as it is embedded 
in a bigger sequence S and as it would be cross-correlated with the returned version of S 
and not of s alone. Hence, a quantitative measure of the goodness of the self-co-operative 
sequence S with marked core s, proposed is as follows. Let the cross-correlations of S 
and s with the return signal be Cs and C s . Let their Schur (component-wise) product be 
C and be called Schur correlation. Merit factor F of a sequence S is defined as the ratio 
of the energy in the main peak of its aperiodic autocorrelation to the total energy in all its 
sidelobes. The Schur merit factor SF of the self-co-operative sequence S with the marked 
segment s is defined as the ratio of the energy in the main peak of the Schur correlation 
C to the total energy in all its sidelobes. Figures 1A and C show the autocorrelation of s 
and S. Figures IB and D show C s (which is different from the autocorrelation of s) and C. 
It can be seen that € is very peaky. Various lengths and merit factors are indicated on the 
figure. 

Figure 2 is a similar figure, except that the length of the core s is now 275 and the 
length of the total sequence S is 625. The peakiness of the Schur correlation C is much 
more remarkable as both s and S are long. In fact, sidelobes are not seen at all. The earlier 
example was merely to demonstrate the ideas involved at an acceptable horizontal and 
vertical resolution in the figures. 

Table 1 lists the results obtainable with core length of 336, Results not reported here 
indicate that longer cores are preferable for better results. 
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Figure 1. (A) Autocorrelation marked core s. (B) Cross-correlation of the marked 
core s with the return version of S. Note that this is different from the autocorrelation of 
s and depends on both s and S. (C) Autocorrelation of the self-co-operative sequence 
S. (D) Schur correlation of S and s. Various lengths and merit factors are shown. 
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Figure 2. Same as figure 1, except that the lengths of the marked core and the 
self-co-operative sequence are much higher. 
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Table 1. Details of some good self-co-operative sequences with marked core of length 336. 





Merit factor of the core= 

10.05 



Length 

Energy 

efficiency 

Merit 

factor 

Schur 
merit factor 

Length 

Energy 

efficiency 

Merit 

factor 

Schur 

merit factor 

672 

0.600 

4.629 

5166.127 

728 

0.636 

4.693 

2324.526 

736 

0.609 

4.652 

4415.978 

750 

0.600 

4.633 

4767.170 

768 

0.602 

4.622 

4167.685 

780 

0.629 

4.749 

4673.124 

800 

0.646 

4.447 

4121.099 

806 

0.614 

4.480 

3436.321 

816 

0.643 

4.580 

3360.117 

828 

0.629 

4.681 

4068.622 

832 

0.624 

4.958 

4651.332 

840 

0.648 

4.723 

3654.543 

848 

0.642 

4.478 

3716.679 

858 

0.619 

4.981 

3759.786 

868 

0.624 

4.851 

3902.958 

874 

0.610 

4.621 

3520.839 

896 

0.625 

4.847 

3963.063 

910 

0.616 

5.250 

4400.841 

912 

0.651 

4.788 

3585.806 

924 

0.603 

4.845 

2734.265 

928 

0.655 

4.792 

4002.008 

936 

0.625 

5.230 

4382.247 

954 

0.633 

4.736 

3222.210 

960 

0.645 

5.255 

4235.305 

966 

0.619 

5.296 

4814.584 

972 

0.640 

5.253 

4294.925 

980 

0.612 

5.566 

4244.196 

990 

0.629 

5.128 

3638.130 

992 

0.626 

5.017 

4514.279 

1058 

0.659 

5.184 

3726.345 


5. Conclusion 

The signal design problem can be solved more satisfactorily if part of the burden of 
obtaining good results can be shared by additional signal processing at the receiver. Here 
is a scheme in which only one sequence is transmitted and yet the advantages of co¬ 
operation are available. Such a coincidence detection scheme would be readily usable. 
As ternary sequences support better merit factors, they also offer better self-co-operative 
sequences. 

The restricted Hamming scan concept can be readily extended to cover emplacement of 
more than one marked segment in a sequence. It can also be used recursively in that the 
self-co-operative sequence S can be used as a marked segment in a longer self-co-operative 
sequence. It has been found that a marked core is better in performance than a marked 
prefix in the self-co-operative sequence. 


The authors are grateful to Dr H K Gupta of the National Geophysics Research Institute 
and Prof R V B Chary of the Osmania University for encouragement and support. 
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